Contents ` 


tM iiallanea 


(i) ^A Recurrence Relation for ΠΗ τα of Pearson e 
- Curves. By M. G. KENDALL : . i ci «σὴ τὰ 

(1) A Comparison of Anriual and Biennial Inflorescences of 
Daucus Carota (wild carrot). By W. D. BATEN . 


(ii) Review of HAROLD JEFFREYS Theory Y: e il 
S. S. Wags : x 


(iv) Review of Wiuton M. ποιον s A ο. : 
Human Morphology, 1914-1939 . 


(v) A Property of the Distribution of Extremes. T H. E. 
DANIELS 


(vi) Proof of Relations Connected with the Tetrachoric Series 
and its Generalization. By M. G. KENDALL 


(vii) The Cumulants of the Distribution of the Square of a 
Variate. By J. B. S. HALDANE .: 


(viii) Numerical Approximations to the Percentage Points of 
the x? Distribution. By ΜΑΧΙΝΕ MERRINGTON . 


(ix) Review of T'he Second. Yearbook of Research and Statistical 
Methodology Books and Reviews, edited ui OscAR KRISEN 
Buros . . ; 


PAGES 


81—82 
82—87 
192—194 - 

194 
194—195 
ΠΤ 
199—200 


200—202 


. 349—350 - 


Votume XXXII. Parr I 


- 


JANUARY, 1941 











A THEORY OF RANDOMNESS .  . 
‘By M. G. KENDALL 


INTRODUCTION 


1. In two recent papers Babington Smith & 1 (1938 and 1939) have discussed 
the problems of sampling with random numbers and the construction of tables 
of such numbers by mechanical methods. With the publication of 100,000 
numbers (1940) what one may call the practical side of the investigation has 
come to an end. The purpose of this paper is to develop the theory of the subject 
and to put in their proper betting some of the ideas on which the practical re- 
search was based. It is divided into two parts. In the first I deal with the 
symbols and mathematics of the theory of random suites, my fundamental 
contention being that a theory of randomness can be developed within the 
framework of existing mathematical notions. The second part indicates how the 
theory is to be related: to practice. 


2. Much of the following work was suggested by the €—€ of von Mises 
(1936) and Dórge (1934), and I take the opportunity of expressing my indebted- 
ness to them. The principal difference between von Mises's views and my own 
concerns his concept of the Irregular Kollektiv, or infinite random series. 
Numerous attempts have been made to show that this concept leads to a contra- 
diction and that it is therefore an improper foundation for a theory of proba- 
bility. Such attempts have mostly failed, but under pressure of the criticisms 
embodied ‘in them the definition of the Irregular Kollektiv has been successively 
modified by von Mises's followers until it has lost the pristine simplicity which . 
was originally one of its most attractive features. I do not propose to discuss 
here the difficulties associated with the concept of the Irregular Kollektiv or the 
various expedients which have been proposed to meet them. I have tried to cut 
the Gordian knot by rejecting the concept, and the theory below accordingly 
avoids all the difficulties attendant upon it. 


‘Dik 1. THE THEORY OF RANDOM SUITES 


3. I consider a finite number r of symbols A,, Ag, ..., Ap each of which will 
be called a characteristic, and an infinite ordered series of these characteristics, 
which will be called a suite. For instance, if there were two ποσο A, 
and A, such a suite might be 


A, A,A, 25 4; 4; 4,45 5. UO HARE (1) 
where the characteristics appear ο Suites exist i in the sense that they 


Biometrika xxxii - [ : $ 


2 : A theory of randomness 


' ean be completely specified by a law of formation, as the foregoing example 
ghows. 


4. Definition. If the proportional frequency of each characteristic in a suite 
tends to a limit in the mathematical sense, the suite is called “proper”. In the 
contrary case, "improper". 

Proper suites exist; e.g. the proportional frequencies of A,’s and A,’s in (1) 
tend each to the limit 4. 

Improper suites also exist. For example, it may be shown that if we take the 
kth digit in the logarithms to base 10 of all the integers, beginning with 1, the 
suite of the digits 0-9 so obtained is improper, since no proportional frequency 

' tends to & limit. i 

Suites also exist which are proper for one characteristic and improper for 

another, provided that there are more than two characteristics. For we may 
, build a suite from the logarithm table in the manner just described, and then - 
insert a new characteristic Q between successive digits. The proportional 
frequency of Q will then tend to $, but those of the others will not tend to a limit. 
If, however, there are two characteristics, the proportional frequencies, being 
together equal to unity, must tend to a limit together. 


b. Definition. The limit of the proportional frequency of a characteristic in 
a proper suite is called the probability of that characteristic in that suite. 


6. Definition. By a “Selector” I mean an infinite series of positive integers 
ordered according to their magnitude. A selector, being infinite, must be 
specified by a law of formation, not by enumeration. 

There is a special class of such laws which deserves separate consideration. 
Suppose we have such a suite as this: 


Ay AAA, Ay A, Ap å A, Αν 454, ..., "NU 


characteristics after the first appearing alternately in pairs. 

Our law of formation of the selector might be in these terms: proceed along 
the series until you come to the combination 4,4, 4, Áa; then choose the ordinal 
number of the next following member of the series, and proceed until you again 
meet that combination; and so on. The series of ordinals so obtained is the 
selector. 

The importance of selectors of this type is that they are mathematically 
independent of the particular characteristic of the member whose ordinal 
number is chosen. By mathematically independent I mean that the value of this 
_ member does not appear in the law of formation, so that the same member would 
be chosen whatever its characteristic. ; 

Definition. If a selector is constructed from a suite and, in virtue of the law 
of formation, any member of the selector is mathematically independent of the 


M. Q. KENDALL  ——— Mur 3 


characteristic whose ordinal number in the suite is ‘the value of that member, 
the selector is said to be “disjoint” with respect to the suite. 


7. It might happen that'a law of formation of a disjoint.selector was given 
which did not in fact lead to a selector in the case of certain suites. For example, 
with the suite (2), if we try to construct a selector by choosing ordinals corre- 
sponding to the characteristics following three successive A,’s, no ordinals 
appear. Such a law I should regard as degenerate in relation to that suite, and 
I exclude it from the domain of discussion from this point onwards. Hereafter, ` 
in speaking of a selector in relation to a given suite I shall assume that the one 
is disjoint with respect to the other. 


8. We may now apply selectors to pick out subsets from a suite. We do 80 by 
choosing from the suite those members whose ordinals are the numbers appearing 
in the selector. 

Ex hypothesi, the result of this process will be a new suite of the charac- 
teristics (some at least) of the original suite. We may call this a “ Derived suite” 
Symbolically, denoting the selector by the roman 5 and the suite by K, we may 
write D-SK. QUELLE E (3) 

I proceed to prove one or two theorems of a negative kind about derived 
guites. 207 i 

9. A suite derived from a proper suite is not necessarily proper. 

For let K = 4,44, 45 44 A5 A4 Αα... 

S= 1, 2, 4, b, 7, 9, 11, 13, 15, 16, 18, 20, 22, ..., 
the numbers running alternately in sets of even and odd, the number of each 
kind being equal to twice the number of preceding members of the seleotor. 
Then SK = Α: 49894, 44 4, 44 Αι Αι Ao Ao As Ag As Ag .... 
However far we go in this series, say to the end of a run of A,’s, there will follow 
twice as many A,’s as there have already occurred of both A,’s and A,’s. Clearly 
the suite is improper. : 

10. If we apply another selector S, to a derived suite we get a further 
derived suite which we may write S,S, K. It is clear that this will not in general 
be the same as S,S,K. 

The "identical" selector E = 1, 2, 3, 4, ... is of some importance. αφ it 
reproduces a suite to which it is applied, ind HS S EK, etc. 


: Randomness 
11. Definition. If the probability of the characteristic A, in a proper suite 
is p; and if the probability of A, in & proper suite derived from it by the selector 
S is also p; then the suite is'said to be random for the characteristic A, with 
respect to 8. 
Suites and selectors with this property exist. Every proper suite is random 


I-2 


4 . A theory of randomness 


with respect to the identical selector E, and the suite (1) is random for both A, 
and A, with respect to the selector 


1, 4, 7, 10,. - 
though not to the selector 1,3, 5,4, .... 
12. Definition. A suite which is random for a characteristic A with respect 
toa ee of selectors S, δ», ..., S,, is said to be random in the selector domain 


S, Ss, - 
- Itis to = noted that if a suite is random with respect to S, and S, it does not 
follow that S, K is random with respect to S, or S, K with respect to 5]. E.g. if 


K = A,A αι AAA: 
S, = 1, 3, 5, 7, 9,. 


S, = the disjoint s ΜΉ obtained by writing down the ordinals of charac- 
teristics next following Αι, 


then S, K = A,A,A, 494, A, ..., 
SK = A Αν Αι Α.Α. Α. E 
so that K is random for A, and A, with respect to both S,-and S. But 
BS, K = Αγ Αν Αν A Αγ Ag sss 
SSK = 4,444444 Αι Αἱ ts.. 


13. Given a suite and a certain finite set of selectors, we-may consider the _ 


suites obtained by repeated applications of groups of these selectors. This will 
give us a series of derived suites which may be infinite but is nevertheless 
ordered. If all the resulting suites are proper and the probability of a charac- 
teristic A in them all is the same as that in the parent suite, the latter is said to 
be completely random in the selector domain S, Sq, ..., Sin 

There exist suites which are completely random in certain domains. E.g. if 


K = 4,454; 454, A, «.., 


S, — 1, 4, 7, 10, ..., 
then | &E-A4,4,4 454,4, ...— Κ, 
80 that repeated applioations of S, lead only back to the original suite. 
Consider further S, = 2, 5, 8, 11, .... 
Then | SK = 444,45 4, A5 A, ... 
and S2K = 4,4,4 4, A, A, .... 


Thus, any number of applications of S, and S, lead either to 4, 44 4, 4, 4, Ag... 
or to Αχ 4, 4, 4; Αν 4,..., and hence the suite is completely random for A, and 
A, with respect to S, and 8». 

It follows at once that any suite which is derived from a completely random 
suite by a selector of the set is.also completely random within the same domain. 


.M. G. KENDALL ; 5 


14. The foregoing examples, though trivial enough, show that the various 
ideas which have been introduced are not.self-contradictory, and that they fall 
within the scope of ordinary mathematical concepts. But the use of the word 
"random " to describe a state of affairs which is the reverse of what is*ordinarily 

. understood by the word requires some scholium. I have, for instance, remarked 
that any suite is random with respect to the identical selector 1, 2, 3, 4, .... But 
surely, it may be said, such a suite as A,A,A,A,A,A,A,A,A... is the very 
reverse of random, being as systematic as any such series can be? I will antici- 
pate a later part of this paper to some extent by a short explanation of this 
point. | M E 


15. In statistical work we require one thing above all in a “random” 
selection ; namely, that if continued long enough it shall draw all members of the 
universe equally often, or at least in a known proportion. In fact, it is not the 
haphazard quality of randomness that we use in drawing inferences from random 
samples, but the only thing about it which is not haphazard, namely its property 
of producing definite limits. (I am, of course, speaking colloquially.) Any 
method of selection would serve equally well if it satisfied this primary requisite. 
The “random” series with which we are familiar in ordinary, work are un- 
purposive and chaotic in appearance for two reasons: firstly, because we fondly 
hope to have a series which -gives- a suite random in regard to all possible 
selectors, so that it can -be used to draw random samples from all universes 

. whatever the characteristic under consideration; such a series must be random 
in & very wide selector domain, including all the more obvious selectors which 
would give the series & purposive appearance, and consequently it looks un- 
systematic; secondly, as an experimental fact we have learnt that when a sample 
is chosen haphazardly it is often random, at least so nearly so that ordinary in- 
spection of a series of results ‘will not reveal the difference; this haphazard 
selection leads us to expect from it an unpurposive-looking result. 


16. But there is no virtue in lack of purpose for its own sake. In fact, 
random sampling has a very definite purpose, and it is the purposive parts of it 
that we have in mind in using the method at all. As the domain of selectors 
becomes larger, the random suite becomes more and more like the completely 
haphazard entity which von Mises would like to make the basis of his theory; 
but in my view the random suite must always be considered as random in a 
finite domain. I contend that there is no such thing as absolute randomness, 
just as there is no such thing as absolute vélocity. The latter has meaning-only 
with reference to a co-ordinate framework, the former only with reference to a 
selector framework. I might summarize the attitude of the foregoing paragraphs 
by saying that they are founded on the concept of the relativity of randomness. 
If this be agreed, the difficulties about terming “random” certain series which 
do not conform to.the colloquial use of the word at once disappear. 


« 


6 A theory of randomness 


Mults-dimenstonal suites 
17. As a simple extension of the idea of a suite of characteristics we may con- 
sider suites of sets of characteristics. Such an extension offers no difficulty, and 
is very similar to the transition from describing points on a line in terms of one 
co-ordinate to points in a multi-dimensional space by several co-ordinates. 
We may amalgamate two or'more suites into a suite of more dimensions. 
E.g. with the suites A A,A, A AAA Ag a, 
B,B,B,B,B,B,B,B,B,...,. 
we can associate the nth member of one with the nth member of the other to 
obtain — —— (4B) (4, By) (4, Bs) (45B,) (41 By) (44B) .... 


Convolution 


18. We may also construct an ‘m-dimensional suite by dividing a one- 
dimensional suite into blocks of m. This process is worth noticing. Consider the 


suite . A AAA A Ap AzA; 3. 
This is proper and each charactéristic is random with respect to the selector 
1, 3, 5, 7, 9, .... 

Now suppose we make a two-dimensional suite by bracketing successive terms, 
ents (45 45) (4.41) (4: 45) (4141)... | 
This is proper with respect to the two two-dimensional characteristics (41 49) 
and (49 41) but it is not random with respect to the selector given. 

Definition. I shall refer to the process of deriving a multi-dimensional suite 
from a one-dimensional suite by grouping sets of successive terms as “convolu- 
tion", and the derived suite will be said to be “convoluted”. From the example 


given it is clear that a convoluted suite is not necessarily random in the domain 
of randomness of the parent suite. 


Independence 
' 19. Definition. If, a two-dimensional suite is derived from two one-dimen- 
‘sional suites by attaching one member of the first to the member with the same 
ordinal number in the second; if the original suites and the new suite are proper, 
and if the probability in the derived suite of a characteristic (A,B,) is the pro- 
duct of the probabilities of A, in the first suite and of B, in the second for all j’ 
and k; then the two original suites are said to be statistically independent. 
Definition. If from a proper two-dimensional suite there are derived two 
proper one-dimensional suites by ignoring the first and then the second charac- 
teristic of the pairs which constitute the suite; and if these two suites are 
statistically independent, the two sets of characteristics are said to be statistically 
independent in the original suite. 


X 


.M. G. KENDALL ; 7 


Statistical independence as thus defined concerns either suites or charac- . 
teristics in suites. Like probability it is a property of aggregates, not of indi- 
viduals. 

20. The generalizations of statistical independence to the case'of several 
suites or multi-dimensional suites can be made without difficulty. I shall here 
omit them and the theorems which they obey, since all the results are obvious 
extensions of the theory of class frequencies set out for the case of finite classes 
in the Introduction by Udny Yule and myself (1939). The following results are, 
however, worth recalling: zs 

(a) If K, L, M are proper suites, K is statistically independent of L and 
-L is statistically independent of M; it does not follow that K is statistically 
independent of M. 

(b) Three suites are statistically independent only if the probabilities 
(4.8.0) are equal to the product of the probabilities A, in K, B, in L, and Cj 
in M. It is not sufficient that they should be statistically independent pair and 
pair, as the following example shows: 

i K = A, 4941 49 Αι ΑςΑ1 Ag... 
L = B,B,B,B,B,B,B,B,..., 
M —0,0,0,0,0,0,0,0C,.... 
Here, for instance, the probability of (4,.B, Ον) is zero in the suite obtained by 
associating triads from members of the suites which have the same ordinal. 


Local randomness 


21. A suite as defined above is infinite. I now consider a finite series: of 
characteristics which I call a sequence. A sequence may be considered as a 
section of a suite. 

It is evident. that any sequence, being finite, can form part “of 8 suite in 
which the probabilities have any given value and which is random in any as- 
` signed domain. We may, however, imagine the selectors of the domain applied 
to the sequence, that part of the selectors which contains numbers greater than . 
the number of members in the sequence being ignored. Similarly, we can 
convolute the sequence in any way consistent with its size and apply selectors 
to the sequences so derived. We can compare the actual proportions of charac- 
teristics in these sequences with those in any given ‘suite. Any such process 
I call a test. . | 

Definition. If the proportional frequencies in a sequence are approximately 
what they would be in a suite random with respect to the selectors of the test, 
the sequence is said to be locally random with respect to that test; and so for a. 
test domain. κ 

To make this definition precise it is necessary to consider what is meant by 
“approximately”. Suppose the sequence is of size n, and consider the r” possible 


8 A theory of randomness 


sequences of this size. To each there will correspond a proportional frequency 
under the tests. Choose a number of these, ar", which may be regarded as 
"approximately" the same as the proportional frequency in the suite. Then if 
the given sequence is one of these, it is approximately" the same. Clearly the 
word approximately depends on the choice of the number « which corresponds 
to what is generally known as a “level of significance". 

The concept of local randomness, in my view, is iniportant. The series.of 
characteristics which we encounter in real life are always sequences, not suites, 
and we have to estimate probabilities and random properties from finite aggre- 
gates, not from infinite series such as form the basis of the theory. Any finite 
series of characteristics whatever is random in the sense that it might arise, how- 
ever infrequently, in random sampling. But in order to make any practical use 
of our theory we have to consider certain series as non-random, or in.other words : 
"we have to judge from the local randomness of observed sequences. 


Part 2. APPLICATION OF THE THEORY TO PRACTICE 


Events 

22. Events are the primary data of statistical experience. Every event has 
a number of properties, the conceptual abstractions of the Gestalts which it 
provides. These properties may be called characteristics, and it is with aggre- 
gates of characteristics that statistical inference is concerned. The throwing of 
a die and the growing of a crop on a given field are events. Characteristics of the 
former would include the number which came uppermost, the time at which the 
throw was made, the angles which the edges made with a line fixed in space, and 
so on. In general an event has an infinite number of characteristics. When‘ we 
have a complex phenomenon such as a crop of wheat on a field it is a matter of 
choice whether we regard the whole thing as one event, or look on it as a series 
of associated events, e.g. a collection of crops on a number of square yards. But 
the event is to be regarded as including the whole of the happening, and is not _ 
synonymous with one of its characteristics. A yield of wheat is not an event, ' 
nor is the number thrown by a die. 


23. Consider then an aggregate of events. Suppose there exists a finite set 
of characteristics such that each event has one and only one characteristic of the 
set; for example, the events consisting of throws of an ordinary die must have 
one of the characteristics 1-6, according to the number which falls uppermost, 
and cannot have more than one. We can then say that the aggregate of events 
gives rise to an aggregate of characteristics. 


_ 24. Now the aggregates of events we meet in experience are always finite. 
“Tt is true that we sometimes regard a line as composed of an infinite number of 
points or a solid body as an infinite number of particles; but these are mental 
fictions and it is not possible to observe the characteristics of an infinite number 


M. G. KENDALL 9 


of events. The finite aggregates of our experience can be ordered to produce a 
sequence—in fact they are usually arranged for us by the temporal order in 
which they occur. The fundamental problem of linking theory and practice— 
and an analogous problem arises in all frequency theories of probability—is to 


relate the sequences of ‘observation with the suites of theory. 


25. The sequences of observation may be regarded as generated by a 
physical process. The sequence consisting of throws of a die, for example, may 
be considered as defined by the rules under which the die is cast. A sequence of 
crop yields is determined by the circumstances under which the crop was grown. 
Ishall assume that the physical process generating a sequence can be depicted 
by a mathematical law defining a suite, in the same way that the “straight 
lines” we draw on paper can be depicted by the straight lines of Euclidean 
geometry, or a rigid body by the'&bstractions of mathematica] dynamics. 
Members of a sequence are ascertained by experiment; those of a suite by 
calculation. 


26. I also take it as empirically established that there are observational 
sequences which can be adequately described as locally random sections of 
suites; random, that is, in certain domains. And I assume that the processes 
generating these sequences will, if continued, produce further sequences which 


- are also locally random. This is essential to all scientific inquiry, that a law 


which is established will continue to operate. If it does not, we must alter the 
law; but before carrying out the extra trials we can only act on the assumption 
that the law will hold. Put in this way, perhaps, the assumption seems un- 
justified, but it is made every moment of our lives. In writing these words I am 
assuming that a past phenomenon will recur, namely that a particular arrange- 


. ment of marks on a piece of paper will evoke certain ideas in the reader’s mind. 


This much I am compelled to concede to those writers, like Mr Keynes and Dr. 
Jeffreys, who contend that probability cannot be defined in terms of frequency, 
namely that the uncertain attitude of mind which one adopts towards some 
laws cannot be measured by probability as I have defined it. As to the phe- 
nomenon of mental doubt, the scientific procedure by which it is removed or 
strengthened, the desirability of measuring it, I am in agreement with Dr 
Jeffreys, but I do not call the measure probability in a statistical context. This 
is only a matter of words, but unfortunately so is a great deal of statistica 
discussion. : 
21. The probability of a characteristic in a physical process is to be estimated 
from the observed sequence to which the process gives rise. I need not dwell 
here on the methods of estimation and the ideas which underlie them. But there 


- is one point of some importance to note. It is evident that probability is a 


property of characteristics, not of events, and to be strictly accurate we should 
always speak of it as such. For instance, if I toss a penny on to a chessboard, the 


΄ 


10 A theory of randomness 


limit of the proportion of heads may be $, but the limit of the proportion of cases 
in which it falls on a, white square may be ἃ. Neither of these fractions is the 
probability of the event. They are probabilities of characteristics and there is 
nothing inconsistent in the fact that they are different. 


Independence 


28. The statistical independence of two suites, or of characteristics in a 
multi-dimensional suite, was defined in paragraph 19, and the statistical inde- 
pendence of observed sequences follows the same line. In statistics the question 
whether two series of characteristics are independent is to be determined purely 
from the experimental data. There may be very good reasons why one charac- 
teristic is “dependent” on another in a causal sense, but if the occurrence of one 
is not accompanied by the occurrence of the other in “unexpected” proportion 
they are statistically independent. Contrariwise, there may ‘be no obvious causal 
nexus and yet the two may be statistically dependent. In fact, I would be in- 
clined to-deny any separate meaning to “ caúsal” dependence other than that of 
statistical depéndence (with perhaps, allowance for the temporal element). . 


29. This point is important in one respect. I have up to the present spoken 
only of the independence of characteristics, not of events, and even of the former 
only in terms of suites or sequences. But in the theory of probability as ex- . 
pounded in textbooks it is quite common to meet with such expressions as “a 
series of independent events", or “successive events are independent". The 
word “event” here means what I call a characteristic; but can we speak of “a 
suite of independent characteristics”? I do not think so. In my opinion the 
concept is equivalent to that of the Irregular Kollektiv of von Mises. 


30. For example; one would he inclined to begin an approach to a definition 
of the concept by requiring that each characteristic was followed equally fre- 
quently by all characteristics of the suite, e.g. that in a suite of characteristics 
A, Δρ, an A, was followed equally frequently by an A, or an A,. But this is true 


of the suite A, A, A,A,A,A,AgAp.. 


which is clearly not of the type desired. One gee then require that each 
characteristic should, in addition, be followed next but one by all other charac- 
teristics in equal amount. But this is true of the suite consisting of repetitions of 
A, Αρ A5 45 4, Αι Αι A,.... 
Baffled by continual examples of this kind, one might then require that the 
occurrence of any characteristic was to be independent of all or any of the 
characteristics which have preceded it. This, on analysis, is found to be equi- 
valent to the requirement that the suite shall be random with respect to all dis- 


joint selectors of the type considered in paragraph 6; and this is precisely the 
difficulty of the Irregular Kollektiv. * 


.M. G. KENDALL 11 


31. I can see no way round this difficulty; and I therefore reject the suite of 
independent events as I reject the Irregular Kollektiv. This has two important 
consequences, the first concerning the ordinary theorems of probability and the 
second concerning Bernoulli’s theorem. | ° , 

The first point may best be illustrated by an example: suppose the proba- 
bility of getting a head with a toss of a penny is 4. What is the probability of 
getting two heads with consecutive tosses? Anyone grounded in the classical 
theory would answer ‘‘}” without hesitation. Nevertheless, the result is only 
true under certain conditions. In fact the data of the problem are that there is 
a suite of throws of the penny and that the proportional frequency of heads in 
this suite is 4. Now such a suite might be (A, = heads, A, = tails) 


A,A, AA Aes, 


and the probability of getting two successive heads is zero. But, it may be ob- 
jected, this is an artificial series which would never occur. To this I should reply, 
agreed, but why should there not occur a natural series in which the proportional 
frequency of pairs of heads did not tend {ο 1? It will, I hope, be clear on re- 
flection that there is nothing in the data of the problem to require the answer } 
88 & logical necessity unless we make some additional assumption such as this: 
the occurrence of one characteristic is statistically independent of the occurrence 
of the next. This contains the answer of 1 implicitly. 


32. In generalization of this problem we might ask: if the probability of the 
characteristic is p, what is the probability that in a set of n characteristics we 
shall get r successes and n—r failures? Here, again, the answer of the classical 


theory would be (^ pt (1— p"; and here again the result is only true if we 


assume the statistical independence of sets of n. Clearly if the result is to be 
true for all n we are once more verging on the suite of independent charac- 
teristics referred to in paragraph 30. I conclude that for statistical purposes the 
results of the classical theory of probability are not to be accepted without 
examination. If in any particular case we require one of these results, we must 
be satisfied that the suite we are considering is such as to justify the use of it. 


Bernoulli's theorem 


33. The well-known theorem given by James Bernoulli in the Ars Consectandi 
is subject to similar limitations in regard to its statistical applications. In 
essence the theorem is & proposition in algebra which may be stated thus: in the 
binomial distribution (p +q)”, if u be the sum of the greatest term and the n 
preceding and n succeeding terms, the fatio-of u to the sum of the remaining 
terms may be made as large as we please by increasing n sufficiently. There can 
be no criticism of this result. But, as applied to statistical series, the theorem 


12 A theory of randomness 


states that if the probability of a characteristic is p and we observe m sets of n 
events, the proportion of sets in which the proportion of successes differs from 
p by less than e tends to unity with large m and large n. Or put another way, 
the probability that the proportion of successes in a set of m differs from p by 
less than e tends to.unity as m tends to infinity. Symbolically, 


P(|p-h(A)|«ed»1-9, m>M, 
where h(A) is the proportion of successes. 


_ 94. This is a statement about the probability of a probability and I need not 
emphasize the logical weakness inherent in it. On looking into the proposition 
further we find that it is dependent on the type of assumption considered in 
paragraph 32, namely, that if the probability of a characteristic is p, the proba- 


bility of r characteristics in a set of n is (5 p ey) ade 


This will only be true if the observed series is, approximately at least, a series 

of "independent" characteristics. Consider, for example, the suite 

41414949 A, Αι 49 4,..., Ro 
the characteristics of which are random with respect to the selector 

135,79... 
and have probabilities each equal to 3. Consider the convoluted suite 
(4, 4)) (4545) (45 4) (4949). 

Bernoulli's theorem, as usually stated, would lead us to the conclusion that the: 
probability of getting in this suite a pair containing one A, and one A, is 4. 
Actually it is zero. 

If, once again, it is objected that this is a highly artificial series, I reply as 
before that series with the same properties might arise naturally. We can only 
make Bernoulli's theorem legitimate by postulating that the suite to which it 
is applied shall have the property of randomness under convolution for all n. 
I do not think this is always a legitimate hypothesis to make, but I am anxious 
not to be misunderstood on the point. There undoubtedly exist sequences which 
can be regarded as belonging to suites random in a very wide domain—-so wide - 
that for many practical purposes they can be taken to be sequences of “inde- 
pendent" events. The point to be stressed is that this assumption underlies & 
great deal of statistical work but is never brought to light, and indeed is often 
not realized. The statistician takes it for granted ; but to the philosopher nothing 
is more surprising than the orderly disorder which is common in Nature. 


Random sampling 
35. A statistical universe is an aggregate of objects, which may be finite or 
infinite. I consider selective processes which consist of abstracting one member 
. at a time from this universe, and I suppose that each member is returned to the 


M. G. KENDALL l 133 


universe after drawing if the universe is finite. The abstraction of each member 
may then be considered as an event, whose characteristics may be noted. 
I assume that there exists for these events a set of characteristics such that each 
event must bear one characteristic and can bear only one. i 

We may then imagine this selective process, which I shall call sampling, as a 
generator of a sequence of any desired extent, and to be capable of continuation 
without limit. The result of unterminating sampling would be to give a suite of 
events, to which there correspond one or more suites of characteristics. In 
practice we shall have a sequence only. 


36. Definition. If a sequence obtained from a universe U by sampling is 
locally random for a characteristic A within a selector domain D, then the 
sampling is said to be random for U with respect to A within the domain D; 
and any member of the sequence is called a random sample for the characteristio _ 
A in the domain D. 

This definition brings out the extremely relative nature of random sampling. 
A method which is rindom for one universe may not bé random for another; a 
method random for one characteristic may not be random for another, even in 
the same universe; and the randomness is always relative to the selector domain. 
It is also to be noticed that the sampling process, being physical, can only be 
related to sequences, not to suites. 

The assumption we make in using a random sampling method is that if it has 
in the past generated locally random sequences it will’ continue to do so in 
similar circumstances. The justification for this assumption is empirical. 


37. In practice we sometimes draw samples one at a time and so obtain a 
one-dimensional sequence. We then convolute this sequence into groups of n, 
making an n-dimensional sequence. But we may also draw the samples in a 
black of n (which I shall call a “clutch ”). The difference is of some importance. 
If we ignore the order of the individuals in a convoluted sequence we have what 
is virtually a clutch, and it is very common in statistical work to ignore the order 
in this way. A series of sampling resulta, for instance, are frequently given with- 
out any indication of the order in which the individual results appeared. It 
should not be overlooked that certain information relevant to the randomness 
of the sample has disappeared in the process. For example, we may be told that 
in a sample of 1000 births 510 were male. We should probably conclude that there 
was nothing in the sample to show that it was not random. But if we know that 
the first 510 were male we should certainly conclude that it was not. 


38. What are the grounds on which a selective process of experience is con- 
sidered to give random samples? In the first place, as has already been re- 
marked, we can only use a selective process to produce a finite sequence. This 
sequence is always locally random in some domain or other.- If we find that, as 
the sequence is increased, local randgmness is maintained, we may say that the 


14 | A theory of randomness 


method is random for the universe and characteristic considered. But we require 
more for a method to be used in practice. We require to be able to suppose with 
some confidence α priors that it will be random for fresh inquiries, fresh universes, 
fresh characteristics and fresh sequences. And we require the domain of random- 
ness to be as wide as possible. It was formerly the custom to assume randomness 
to any desired extent if there was no obvious reason to the contrary—a sort of 
Principle of Non-sufficient Reason. This is most unsatisfactory, and could only 
be justified if it was found in practice that haphazard methods of selection give 
locally random sequences. In fact we find that whenéver any element of 
personal choice is allowed free play, bias is very liable to appear. 


39. It seems to me that we can never rid ourselves entirely of the possibility 
that a method of selection may lack randomness; but we can safeguard against 
the possibility to a great extent. For instance, the method of Random Sampling 
Numbers applied to a universe of names in a directory gives us something near 
certainty (if I may be allowed that colloquial expression) that the resulting 
samples will be random. Furthermore, we can experiment with a method to see 
if bias has appeared. Ifit has not, we are justified in expecting that it is random 
for the class of cases in which it has been tried. Ultimately, however, the 
assumption of randomness is part of the hypothesis which is being tested. 


40, An assumption which is usually made in practice is that the method is 
random within whatever domain happens to suit the investigator at the 
moment. One draws a random sample from the universe of inhabitants of the 
British Isles. One says that the sample is “random” without any qualification. 
Behind this lies the assumption of the Irregular Kollektiv which has been con- 
sidered from a different angle in paragraph 30. A great many statisticians would 
use such a sample to test any hypothesis about the universe which they chanced 
to encounter; they would assume that it was random in regard to height, sex, 
age or any other characteristic; they would assume that it was random under 
convolution; and they would assume the legitimacy of testing in any sampling 
distribution which happened to be convenient. ΑΙ of this amounts to an 
assumption of randomness in a very wide domain, depending on a subjective 

‘judgment which may be quite wrong. The wider the domain, the less likely 
(again speaking colloquially) is the assumption to be justified. In practice this 
assumption frequently has to be made, and can be made without much danger 
with a good sampling method. But the greatest danger lies in the fact that the 
person making the assumption very often does not even realize that he is doing 
so. In any sampling inquiry it is necessary to ask oneself, Is the sampling method ' 
I am using random for the universe I am considering, for the characteristics I am 
discussing, and for the sampling distributions or tests of significance I am em- 
ploying? Randomness is relative. 


M. G. KENDALL ' 15 


, REFERENCES : j 
Dóncz, K. (1934). “Eine Axiomatisierung der von Miseschen Wahrscheinlichkeitstheorie.” 


Jber. dtsch. MatVer. 43, 39. : 
KENDALL, M. G. & ΒΑΒΙΝΩΤΟΝ ΒΜΙΤΗΕ, B. (1938). “Randomness and random sampling 


numbers." J.R. Statist. Soc. 101, 147. . 
— (1939). "Second paper on random sampling numbers." Supp. J.R. Statist. 


Soc. 6, 51. 
—— (1940). Tables of Random Sampling Numbers. Cambridge University Press. 
von Mises, R. (1956). Wahracheinlichkeu, Statistik und Wahrheit. Springer, Berlin. 
Yours, G. Upyy & KENDALL, M. G. (1939). An Introduction to the Theory of Statistics. 


12th edition. Griffin & Co., London. 








A GEOMETRICAL ANALYSIS OF THE FREQUENCY 
DISTRIBUTION OF THE RATIO BETWEEN TWO 
{ VARIABLES 


By C. NICHOLSON, M.C., Μ.Α., M.D. 


Teus subject has already been treated by Geary (1930), and by Fieller (1932), the 
approach to the problem in both cases being algebraical; the geometrical approach 
to the same problem suggested itself to me when I was working on a series of 
anatomical measurements (Nicholson, 1938). This paper could hardly have 
reached publication without the generous assistance of Mr N. L. Johnson of 
University College, London. i 


(1) VARIABLES INDEPENDENT 
We are to consider the distribution of the ratio 


y Y 
ᾳ1-Χ᾽ 





where Y and X are constants, and the joint distribution ofa «and y is given by the 
normal bivariate surface 


BENNETT M 
= 200,0, P 2ἱσὲ σὲ] |! 


Then if we refer our observed values (y+ Y) and (z-- X) to the axes y = 0 and 
2 = 0, the co-ordinates of the intersection of the zero values of the variables will 
be —X and — Y, since + X = 0 when z = — X. The ordinates for a constant 
value of the ratio will lie in a plane surface passing through this point of the 
general form y = mete, | 
where m will have the value of the ratio, and ο will be mX — Y. The equation to 
the projection of the section of the normal surface by this plane on the (2, z) 


plane will be ud liz a (mz +c)? ; l 
E 200,0, P 3 ci σὲ Ἴ (1) 


which can be rearranged to 





` meo? 2 


ΕΚΕ ο PE Je, 1177 mio +03) 
- 3πσ,σ. 9ΧΡ | 2 bs (mig? + wm Jem : σ.σ. » IU 
M (mio + et) 


C. NICHOLSON : ` 17 


The area of this section is the integration of z from —co to +00, the variable 
(because of the change of angle) being x J/(1-- m?), and this is 


απ) [- dris 2), 
vm) noto) P al +33) DA 
: E ry. mX - Y 
The quantity JAFA) in (2 )i is equal to Jim Ταλοξ τσὴ’ which i is the function of 
the ratio which Geary treats ast. Now m is the tangent of an angle, say f, and (2) 
' may be put in the form 











1 1 c cos f 4 
A (27) ./(o2 sin? £ + 02 cos β) = |- TS ci zap) | Wn 
Here c cos f is the perpendicular distance from the origin of the surface to the 
plane y = mz + c, so that we can draw the conclusion that a series of parallel plane 
sections of a normal surface making an angle of β with the x axis are normal 
' curves with a standard deviation of ΝΕ ; - 





0,0, 
«σε sin? f 03 cos? )' 
while their areas form another normal curve with the variable coos f and with a 
standard ΡΟΗ of (o3 sin? β + 03 cos? p). 


(2) VARIABLES CORRELATED 
Clearly, in the case where the primary variables are not independent, it is still 
possible to reduce the distribution to this same geometrical system, so that we 
may discard reference to the primary variables and refer rather to the principal 
axes of the surface generalizing (3) as 
1 





Vm) α/(αἲ sin? (a+ 8) +b? cos? (α + 0)) 


ves Mir argen). « 
i y (a? sin? (a + 0) + b? cos? (a + 0)) 

where, referring to Fig. 1, a and b are the standard deviations in the direction of 
the major and minor axes respectively of a normal surface; also k is the distance 
of a point K from the origin, K being the focus of a pencil of planes cutting the 
surface, and k being equal to J/(X* + Υ). The angle KOA is æ, which is the absolute 
value of the angle less than ἐπ between the major axis of the surface and the line 
joining the origin to the intersection of the zero value of the variables; it is 





G | tan (Y/X) - 6], 
where ὃ is the angle between the x axis and the major axis which is given by δα 
equation à 
ii "s -tan 26 = fata. 
oł- σὺ | 


0 is the angular deviation of any plane from the angle a which is taken as the 
origin of the pencil, and the value of any ratio will be given by tan (a -- ὃ --6). 


Biometrika xxxi .2 


18. — Frequency distribution of a ratio 


It is clear that if we were to confine ourselves to the distribution of the ratio 
we must use tan (α -- ὃ -- 0) as the variable, but in this generalized form it is more 
. informative to use the angle itself as the variable. The cumulative frequency for a 
deviate ϐ i8 the content of the two plane angles PK T and P'K T", and where K lies 
without the bulk of the distribution the content of the angle PK T will be negli- 
gible. The content of the angle P’K T" may then be taken as the content between 





Fig. 1. Projection of normal bivariate surface to illustrate the geometry of 
the ratio between variables. 


` 


the two parallel planes, TKT” and SOS’, the latter passing through the origin of 
the surface, and this is $ i 
1 
A (211) JJ (a? sin? (α + 0) +b? cos? (a + ϐ)) 


ksin 1 u 
. κ, exp | sivi ER ἘΠῊΝ pee) 
En (6) we Dub qos ant στη Boost (a..0j) 0° 5 We get 
ksin 
1 vV (a* etn* (a+) 4- b* cos? (a@+6)) 
18]. P 











Mi, * (5a) 


where, if the variables are independent 


ksin 6, = c 008 (a 4- 0) s πΧ : 
Al(a3 sin? (a + 0) --b* cos? (x--0)) —.J(a*sin?(a--0)--b* oos? (x--0)) —J(m*oi--o2) 


80 that (5) is identioal with Geary's formula. 





©. NIGHOLSON — ^ l 19 


Continuing to regard 0 as the variable, the equation to the curve for the 
frequency distribution given by (5) is the moment of the area (4) about K; and this 
can be arrived at either geometrically or by ἘΜῈ (B) with regard to 0; 
it is i 





1 [omnem ERA 
y= yen ’ (a* sin? (α +0) + b? cos? (α -+ 0))* 





xexp| - zl rane : d (6) 
2 | v (a? gin? (æ +0) +b? cos? (æ 4- 0)) 

Here the numerator of the factor within brackets may be put in the form 

k μία! sin? a + b* cos? a) sin (x+y 4- 0), (7) 
where y is the absolute value of the angle which the axis conjugate to POP’ makes 
with the major axis of the ellipse, i.e. where tan y = (b*/a3) cota. It is thus seen 
that the curve is limited, the range of 8 being from — (æ +y) to 7 — (a +y); at these 
angles the value of y is zero. 

It should be noted that the onion between the planes SOS’ and TKT” is 
equal to the content of the angle P'K T' minus the content of the angle PKT, so 
that in the integration the value of the content of the angle PKT is twice neglected. 
It follows that if we integrate (6) between the limiting values of θ the total amount 
neglected is twice that part of the surface which lies beyond a plane G.K G' which 
passes through and makes an angle of y with the major axis, the value is 


9 eo 
=~ | cedi, 
(27) Í Κ/σα 
where c, is the standard deviation of the normal curve given by a plane section 
of the normal surface which makes an angle of æ with the major axis, i.e. where 
ab - 
*  J(aàsin*a + δ cos? α)᾽ 


6 





(3) CURVES GIVEN BY THIS. EQUATION 


The curves generated by this equation are of very great variety, the majority 
being bell-shaped, and we may now discuss the effects produced by varying the 
constants. It should be noted that, as POP’ bisects the normal surface, the origin 
of the curve is neither the mean nor the mode but the median. 

“k” may have any value from 3c, or 40, up to infinity; as k tends to infinity 
the distribution of the standardized variable 0 tends to normality with a standard 
deviation of 1 

ενα) sin? a + b? cos? a). 


As k decreases in value the departure from the form of the normal curve becomes 
more marked. 


taa 


20 P Frequency distribution of α ratio 


“a/b” may have any value from unity to infinity. When the value is unity the 
curve is very similar in form to the normal curve; as the value of a/b increases 
departure from the form of the normal curve increases. 

On the value of a, the symmetry of the curve depends; a may have any value 
between 0 and ἐπ. At zero the curve is symmetrical but steeper than the normal 
curve. As a increases asymmetry develops (asymmetry being taken to mean the 
excentricity of the median), the maximum excentricity being reached at & value 
of tan"! (b/a), thereafter the curve returns gradually to symmetry at a value of 


0-10 


0:02 





0 
-80-6 -40 ' ο 40 80 99-4 
2n Degrees 
Fig. 2. Curve from equation (6). Constante: a2 4, b— 1, k-:3o,, a= 80°. 


$r, where it is flatter than the normal curve. Skewness develops with asymmetry 
but more slowly, and the maximum skewness is not reached until « has a value of 


i. . ὃ 

If k is relatively small, i.e. is near 3c,, and a/b is large, and if æ is near $m, we 
get 8 curve with a maximum value on each side of the median, symmetrical when 
-α is ἐπ. This distribution does occasionally arise in practice; an example is given - 
‘by. Udny Yule (1932). Fig. 2 is an éxample of a slightly asymmetrical curve of 
this type. ; 

(4) A GENERAL SOLUTION 

K may very well occupy a position within the bulk of the surface; this must 
happen when both of the variables have negative values If we can obtain an 
equation of the curve in this case, it should have a general application to all values 


C. NICHOLSON | 21 


of k. Geary stated this problem but did not proceed to its solution, Fieller gave a 
general algebraical solution. 
We may consider K as lying on the circumference of the ellipse 
a πα 
λα 
where h = = He and the ordinate on the circumference of the ellipse is 
1 . 
XEM —— ena, 
: rabl 
In terms of the primary variables, 





r. (ΠΣΕ T) 
-(1--ν}λσἑ σισν o 

As before; the frequency curve of the angle ϐ is given by the positive moment of the 
normal curve (4) about the ordinate at K. This moment may be considered in two 
parts, the moment arising from the portion of the curve without the ellipse (A), 
and the moment arising from the portion of the curve within the ellipse on which 
K lies (B). 

(A) For any normal curve the moment about the origin for that aus of the 
curve beyond the ordinate at a given deviation hc is 


vol xe tale) da. 
ho 


which is equal to —— yore V. 

That is to say the moment is function of the ordinate at'the given deviation and 

of the standard deviation. Moreover the moment about this ordinate of the two ` 

equal tails of the curve is equal to the moment of these tails about the origin of the 

curve, 80 that in our case the required moment is 

e~ ih ab : g^ 
m a®gin® (a +6) +6? cog? (α -- θ)᾽ (8) 

(B) The length of a chord of the ellipse which passes through K and makes an 

angle of «+0 with the major axis is 














Precor aer] © 
Al (a? sin? (a. + 8) + b* cos? (a. + 6)) - Mi (a? sin? (a + 0) + b* cos? (a. -- ϐ))/ }} 
: ksin 
Binge h(a? sin? (α +0) + δὲ cos? (a +8) 
we may put (9) as 2h c08 $0, o. The total area of the normal curve in this plane 
“using (4) is a. 
(27) 4 (a? sin? (æ + 8) +b? cos? (α -- θ)) 
so that the ordinate at its origin is i 





and if we make (10) 


e- 30 sin ῥ, 





Ll Q-106 mn o. 
2nab 


πι 


qu 1 


22 Frequency distribution of a.ratio 


and the area of the portion within the ellipse is 


1 h cos $00 4.6) 1 v 8 UU 
aom 5 | -3( J]e u 
mab 0 PL Bouse! - τ 


This is multiplied by h cos $F iat)» 





1 hcosó ν 
l πας o t o coh cosg f e "du, (12) 
which may be simplified into 
(0 gam “ab > 





: Les 
7 asin? (a+0) +6? cos? (a+6) ία αφ) rS ΟΝ 








1 1 
ας τες μασ E 8 . 
το ως ο +h; (12a) , 
adding (8) and (12a), we have 
ee ab (1+ (cong RE sj) 
ii 7 αλ sin? (a. -- 0) - b* cos? (a-- 0) V. +736 it 
1 1 
mE o m RE MS 8 
pug Ponte rs eon)... 18) 


as the equation of the curve. This expression converges quite rapidly for values of 
h up to 3; thereafter convergence becomes very slow indeed. 
Reverting to (10), the value of sing may be put in this form 
ab sin Ó (102) 
(a? sin α +b? cost α) (a? sin? (α +0) -- δὲ cos*(a3-0)) — ' 
From this the following identities may be established: 


a? sin asin (α -- 0) +b? cos α cos (x +0) a4) 








a Aa? sin? c+ b? cos a) (a? sin? (a +0) +b? cos? (a +8)’ 
p = Psin? = 4 Tib) 
a? gin? (a + 0) +b? cog? (a + 0) 
φ = tan ((a/b) tan (α -- 0)) — tan ((a/b) tan æ}. (16) 


It should be noted that the limit of the integral in (5a) is ksing and that the value 
of ¢ in (16) makes the practical work of calculating a series of these limits very 
simple. If the distribution of ϐ is still to be regarded as round the median at α, 
0 will have the limits as before, —(«+y) and 7 — (a +y), and at these limits ¢ will 
be F ἐπ; generally, however, 0 may be regarded as having a range of 7 beginning 
at any value, in which case ¢ will have the same range but with a different 
distribution. . 

Α geometrical construction to show the relationship between ϐ and ᾧ is given 
in Fig. 3, where OA and OB are equal to a and b respectively, and O'AB is a, 
right-angled isosceles triangle on AB as hypotenuse. 


C. NICHOLSON 23 


Here l AP = 3 OP d 
and jo ΡΕ, ; 
so that AP/PB = (a/b) tana; e 
by the same reasoning AP/PB = tan AO'P, 
gimilarly ' 40'Q = tan-{(a/b) tan (a.--0)); 
therefore ' ` PO'Q= ϕ. : ^ 


This relationship between ϐ and ¢ shows that the solution consists essentially 


in referring an asymmetrical system to an equivalent system where the standard 
deviations are equal. - 





Fig. 3. Diagram to illustrate the relationship between 0 and 4. 


If we now turn back to (6), this can be put in the form 


BENE hab (a? sin a sin (æ + 0) + b? cos æ cos (α + 0)) 
y= α/(2π) (a? sin? œ +b? cos? x)? (a? sin? (ac + 0) +b? cos? (α + 6))# 





gi sind", (17) 


80 that we have the approximation of (6) to (13) 

eH do 
Jerda 
Tt will be seer that the lest expression may be written 


edd cos ó . 
= h cosg eX oos "stra ον ems. (18) 
0 


e dé ΝΔ oos gy | d ( Hh cos $)* f E 
Janu HÉ ese E +3 l—hocosóe iom 
The difference between the values of y given by the two equations (if we do not take 
into account the value of d¢/d@) is given in the table below for different values 
of h and for ᾧ = ἐπ and 0, showing that (6) must be a poor approximation to (13) 


e-du), (19). 


24 Frequency distribution of a ratio 
when is less than 3, but that beyond that figure the difference rapidly becomes 

















: negligible. 
Y E Difference between (6) and (13) 
| h 
$-0 EN 

1c 0-066475 
2 0-006776 
3 0-000306 - 
4 - 0-000006 








` Fieller's general solution‘of the ο. 18 a in his formula (24) - 


_ 1 Oyl) Le Oe ot (oo eh EAT 
V0) 7 raf iron m, +008 expl -; 31-- = : ty o, cà]. 


ry V 


αυ” 
+oxp| -5 20i— 2rvo 0, - v*o? 
- my(rUg s — Ko y) }-υσαίτασγ-- Yo) 
σγ({ῇσ,--Ἔσν) t vo, (rzo, — Ίσα) [σεσνία-- τὴ (σγ' -- 3γσκσγ!-οὐσαθ)ὰ c 
n(o2 — 2rvo o, + 303) jo 
v is the value of the ratio under consideration which may be put as tan («+ ê+ 8), 
and if we reduce Fieller’s formula with its symbols based on the co-ordinates of 
S the primary variables to the system with its symbols based on the principal 8ΧΘ8, 
we get the pele identities: 
— 2rvo,0,+v803 = (1 4v?) (a? sin? (a+) +b? cos? (a. -- 0)), 
1--γ8 = (a*?0*)/(o303), ᾿ 
a? zy (y! _ e(a sinta +b? costa) 
ΕΣ 27 22 
f 0,0, σὲ σᾶσξ 
κε -— 
Tya “σι +o (roy — σι) = k (+18) 
i . x (atsinasin (x +8) +b? cosa cos (a+ 8)), 
so that Fieller 8 formal a8 & whole reduces to . 
1 1 dp y taiga É hoosg: frog eit, f 
V7 Tirade” το bad], τον, 
Here déjdà = 14-93, so that the distribution V(v) of v given by Fieller’s equation 
is equivalent to the distribution of 6 in the sum of (8) and (12). : D 
The curves arising from (13) are of very great variety of form. They are all i » 
limited, indeed they are more properly described as cyclical, the οσο at the 
limits for 0, since ó then equals +47, being both equal to” 
e3 af gin? a +b4 costa 
7 ab(a?*sin*a +b? cosi a)" 











du. 





? 


T 












(20) . 


C. NicoHOLSON Ἢ 25. 


That being so there is no necessity to regard the curve as beginning and ending at 
the limiting values for 0; in practice it would probably be taken to begin either 
where a+ ô is zero or where the ratio under consideration has a value of -ο, i.e. 
where a ----0 is — ἐπ. The unit deviation in both equations (6) and (33) is the 
radian; for practical work m/n would probably be used which would require a 
‘corresponding alteration in the equations: If we continue to regard the curve as 


0-15 


0-10 


0-05 





0 0 
-45? - 0 45? 905 1355 -455 0 455 905 135° 


τ 0415 


0-10 |- 





0 - 0 
-45? 0 45? 905 135% 4 oom 0 45? 90° 135° 


Fig. 4. Four curves from equations (13) to illustrate the change in form as h increases 
from 0 to 3. The curves commence where x 4-0:: 0. Constanta: a= 2, b=1, a = 45°. 


drawn with the median at a, we find that when ἡ is zero it is U-shaped except when 
æ is small, and is more or less asymmetrical depending on the value of æ. With 
values of h about 3 it is possible to produce curves of a highly asymmetrical type; 
as ἦν increases beyond 3 the curve reverts to the bell-shape and the limits recede 
from the bulk of the curve until, as we saw, when h is infinite the curve becomes 
the normal curve and is unlimited. Fig. 4is an example of the development of the 
curve as h increases from zero to 3. — 2 


26 Frequency distribution of a ratio 


(6) INTEGRATION 


The value of a frequency is the integration of y in DA with respect to the 
variable 9 and, making use of (15), this is ` 


ea] κας (hos 


Ρ p CEPS οσο — Li aka “44. (21) 


An expression for this integral may be obtained by converting the series in 
powers of cos 6 into a series of cosines of multiples of ¢, integration then gives a 
series in sines of multiples of ; but the functions of A which are the factors in the 
series are very complicated and do not lend themselves to easy computation so 
that it is better to reconvert into a series in powers of cos¢, and (putting m for 
133) this is : 

Q(g) = + Sng cord coso la- 


m) + &(1— em — emm) cos? 


2.4 7 
` -- ορ. ep m -m 4 
+y ;ü e emm- e 51) e ó 
2: 4.6 mà m? 
— em... = σσ 8 
dl ses eee omen cm ge b+, (22) 


where the occurrence of the terms of Poisson’s series is very interesting. 


* 


(0) CoNOLUSION 


The frequency distribution of the quotient of two normal variables may, then, 
give rise to most of the forms which are met with in statistical work. It is not, 
however, suggested that such statistical distributions always arise in this precise 
fashion; at the same time, from geometrical considerations, it seems likely that 

.the product of two variables would produce a similar set of curves. It is not 
impossible that a large number of primary variables might group themselves into 
two secondary variables of approximately normal distribution and that the final 

' distribution is some function involving either the quotient or the product of these 

variables. However that may be, the fitting of this curve to any given distribution 
appears to present many difficulties and is quite beyond the scope of this paper. 


(7) EXAMPLE 


The following example illustrates the practical use of ¢ in the application of 
Geary's approximation. Some of the difficulties of childbirth are undoubtedly due 
to a disproportion between the size of the foetal head and the size of the bony 
opening through which it has to pass, the brim of the pelvis. This difficulty 


C. NICHOLSON : 27 


becomes absolute, i.e. demands caesarian section, in about 1 ος of all cases (in 
' Guy's Hospital (1937) ten cases were dealt with by caesarian section on account 

of ‘disproportion out of 990 pregnancies in a fair sample of the population); it 

would be well to know for the purposes of prognosis the percentage ratio between 

the size of the passenger and the size of the passage beyond which spontaneous 

delivery becomes impossible. The foetal head, as it passes, is roughly circular in 

section and the area of the maximum section may be calculated from the biparietal 
^ diameter; the figures for this diameter are taken from a series of 1010 measure- 
ments by Ince ( 1939). It has been shown that the area of the pelvic opening is given 
to a close enough approximation by the area of the ellipse on its antero-posterior 
(conjugate) and transverse diameters; the figures for these are taken from a series 
of 850 ‘measurements made by radiology (Nicholson, 1938). It might be well to 
add that the radiological method used (Nicholson, 1936) has a probable error of 
accuracy as low as a millimetre. 


These figures are 













Mean (mm.) 
Standard deviation (mm.) 
Coefficient of variation 


The distribution of these variables is normal, and the two latter are independent, 
so that we can estimate the following figures for the two areas: 





Foetal head Maternal pelvis 
` (2) 





Mean (sq. cm.) i 65-8 "121-0 
Standard deviation (sq. om.) " 6-8 12-9 
Coefficient of variation 8-8 10-7 











The distribution of these variables is not, theoretically, normal but the error 
from assuming normality will be negligible; we shall also assume that they &re 
independent, an assumption which is apparently not unreasonable. We may now 
calculate the following constants for the frequency curve for the ratio: 


& —12-9; b=58; a-tan-! (Y/X) = tan? (65-8/121-0) = 28° 32’-25; 
k = 187°108; σ, -- 9:387; h= 14-720. 


28 _ Frequency distribution of a ratio 


From the tables of the normal curve we get the deviate for a frequency of 
1% as 2-3263, cos applying (5a), (10), and (16), we have 


΄ 


- hsing = 2-3263, 
` sing = 0-15804, . 
É ¢ = 926', 


`  tan-1{(a/b) tan a} = 50° 25', 
: tan“ {(a/b) tan («7 +6)} = 59° 31, 
(a/b) tan (4 +0) = 1:69879, 
tan (α 1-0) = 0-764. 


The required percentage ratio is then 76-4 95. Using the usual approximation 
(Y/X) {1+ (c,/XY, the mean of the ratio would be 56:6 %, and its standard 
deviation (Y/X) {(o,,/X)*+ (o,/Y)%}, 7-5 %; if wehad assumed thatthe distribution. 
of the ratio was normal, we should have got a result of 72-9 %; so that, even when ἦν 
is quite high, the distribution of the tails of the curve may be far from normal. 

The value of the figure 76-4 from the point of view of prognosis is that we can 
now predict that unless an event has occurred, the chances against which are 
99 to 1, a pelvis with an area of 1108q. cm. can pass 99-9 % of foetal heads, that 
a pelvis of 100 sq. cm. can pass 97 %, that a pelvis of 908q. cm. can pass 70 95, but 
that a pelvis of 808q. cm. can pass no more than 21 % of foetal heads. 


REFERENCES - ' 


Fumer, E. C. (1932). Biometrika, 24, 428. 

GzARY, R. C. (1930). J. R. Statist. Soc. 93, 442. 

Guy’s Ἠορβριτατ, (1937). Olinical Report of the Maternity Department. 
Inor, J. G. HasrrNas (1939). J. Obstet. Gynaec. 46, 1008. 

NionoLsoN, C. (1936). Lancet, no. 231, 616. 

(1938). J. Obstet. Gynaec. 45, 950. 

Upxv Yurz, G. (1932). Theory of Statistics, 10th ed. London: Griffin, 78. 





THE STATISTICAL SIGNIFICANCE OF  . 
CANONICAL CORRELATIONS 


Bx M. 8. BARTLETT 


. 1. In an important paper published in this Journal, Hotelling (1936) has 
shown that the generalized variance matrix* 


of a vector variate X which has been partitioned into two parts x, and x, with, 
say, g and p components, can, by appropriate linear transformations L,x, and 
L,x, of x, and Χρ, be thrown into the canonical form 


L,VuL; Γι] ι R 
n Li κ, k 1 } 
R is a rectangular matrix which is zero oe for @ leading diagonal of μα, 
à? of canonical correlations. 

Similar operations on the estimated matrix variance V give rige to οκ λαίδη 
“canonical correlations l, which measure the correlations between estimates of the 
linear functions L,x, and L,x,. While Hotelling has given asymptotic standard 
errors for the coefficients {,, it is known that the significance of these correlations, 
as in the simple case p = 1, is more generally to be interpreted as the significance 
of the regression relations of x, with x,, the validity of any exact tests of signi- 
ficance depending on the supposition that the dependent variate x,, apart from 
its linear dependence on Xj, is normal. 

Special cases of the simultaneous distribution of the correlations l;, when Xa 
and x, are unrelated, have Deen considered by Hotelling (1936) and Girschick 
(1939), but an important theoretical advance is represented by the derivation of 
the distribution (under the same conditions) for any values of p and q (Fisher, 
1939; Hau, 1939). It will be shown that this distribution makes available further 
possible teste; and since the problem of the most appropriate tests of significance 


* A matrix is usually denoted by a capital letter, and if it has both a population and sample 
value, the population value is given in heavier type (cf. Bartlett, 1939). The transpose of any 
matrix A is denoted by 4’. A matrix with only one column is a veotor, and is often denoted by a 
small letter. To avoid confusion, a vector variatex is written in heavier type throughout, to distin- 
"guish ft from a single ο. z. ΙΕΚ is measured from its population mean, the variance matrix V 
18 the average value ofxx'. In practice we lose one or more degrees of freedom by measuring x 
from sample or regression means, but without loss of generality we shall suppose: that our sample 
consists of measurements ofx with ν degrees of freedom. 











ü e A 
l3 
zu 
ος e 
Ee oc 
zZ e 

r kii ed 
eio E d 
ολων 

“η 


x 


30 The statistical significance of canonical correlations 


- has not always been considered very adequately by other writers, it is also the 
purpose of this paper to explain the logical relation of these further tests to tests’ 
of significance previously available.* 


2. For the case p <q, the distribution of Ll, when these roots are arranged in 
order of ae is given. by - ᾽ 
F(2, 18, ...,%) dtd... dB, 


WHOIS F=C I eee (1—ijie-e-. qq um) 
- so wee 


I'$(v —i-4 1) 
"us PUE T rye ixlI$(9-4-i-1IT£p-ily 
For p> 4, we,need only reverse the roles of xXx, and X,.  . 
A criterion which is useful in detecting the simultaneous departure of several 
roots A; from zero is the product 


B [I (1-2) = 4, say. 


i-1 


Whenp-1, ifie distribution of 416 equivalent to that of È, and the distribution 

in (1) can be transformed if required into Fisher’s z-distribution. When p = 2, it 

was found by Wilks that a similar distribution exists for 4/4. For p > 2, no exact 
test is at present available, but the formula 


X = —{v—F(p+qt1)}log A, 
vith pq degrees of freedom, gives ὁ good approximate test (Bartlett, 1938). 
' Tf the roots Aj, ...,A} are zero, we are, however, including in A irrelevant 
degrees of freedom which might possibly obscure the significance of A?. For any 
test on A? by itself, we have little choice but to consider ᾖ, though we do not really 
know whether l? is the root corresponding to A? or not. The probability distribu- 
tioni ρ(3) is theoretically obtainable from (1), and hence also 0-06 or 0-01 levels 


* The distribution of 4? obtained by Fisher and Hsu has also been obtained by Roy (1939), 
though this writer was concerned with the different problem of comparing the dispersion in two 
multivanate normal samples. For a single varate, testing the mgnificance of a sum of squares 
separated off from the total sum of squares by a multiple correlation or regreasion formula is 
equivalent to testing the ratio of two variances, a criterion also employed to test the equality of 
two population variances. Roy has ‘proposed generalizing the latter problem along lines which give- 
rise to the same distribution problem solved by Fisher and Hsu, but while he has independently 
obtained the same general distribution, the need for some care in the choice of teste in multivariate 
analysis 1s even more evident in the problem with which Roy was concerned. It is obvious, for 

' example, that the p roots which Roy considered cannot represent all the possible differences among 
" the $p(p-- 1) variance and covariance parameters between two p-variate normal samples, and some 
explanation of their interpretation seems required. 

+ This cnterion has been proposed by Wilks (1932), Bartlett (1934), and Hotelling (1936), the 
last-named. denoting it by z. 

1 The probability of a random variable having a particular value a is denoted by p(z).. If the 
variable has'a continuous range of values, p(x) denotes the probability of the variable falling i in the 
interval z, x+ dz. The corresponding notations z | y and p(x | y) are used when the variable is only - 
being considered for a fixed value y of another variable. The probability symbol p 18 not of course to 
be confused with the number p of components in the vector variate Χα. 


B 


(1) 





i 


: M. S. BARTLETT ' 8l 


of significance of l for specified values of v, p and q. The tabulation of these levels . 
would be useful, but would also be a task of some magnitude, and it is therefore | 
worth noting that owing to the problem of identification, the largest root {4 is not 
a sufficient statistic for A?, and p(l) has no unique relevance. If we consider, 
instead, the distribution of l} for given values of 13...12; we have corresponding 
to the probability relation 


p, ... 23 5) = pġ |, t TUAE B, o D) 
the probability density ΜΉΝ 

Ε(ᾱ,..., δ) = f]; ...,B) 0. ..., B), f 
where the function f,, apart from the constant term f,, is determined at once from 
the function F. 3 

“In the logical situation we are postulating whereA2, but not the other roots, is 

different from zero, it is not evident which distribution, ϱ(18) or (it | 2, ..., B), 
provides the more powerful test, owing to the absence of sufficiency properties, 
and it is of some interést to consider in detail another problem which is trivial in 
itself, but serves to illustrate the principles involved. 


8. Suppose we have a pair of variates x, and αρ both independently following 
a rectangular distribution p(x) =de, (κας). 


One variate (unspecified) is then shifted a distance a, so that it follows the 


distribution p(z) = dx, (ακας1] --α). 


^ 


If, and x, denote the variates in order of magnitude, we shall detect the shift 
«τα from the larger value, σι, if æ is large enough. To compare the rang of p(x) and 
p(z, | ὧς), we note first of all that when a = 0, 
p(z,)-2z,dm, (0&z,«1) 
dz, 


p(z | 29) = I-a 





(£2 S T1 S 1) 


For the significance level e, p(z,) gives a critical value z, = A(1 — 6), while p(x, | 24) 
gives z, = 1—e(1—«,). If a is different from zero, a peculiar feature.(analogous to 
the canonical correlation problem) is that the larger observation z, may or may 
not be associated with æ. For p(x.) we find ΄ 
: ο (2a, — a) άν (ακαις1) 
dz, (1. «σις] -α) 


* ^ 


For p(x | 22), we have 
l NN C 
a 4 2(1—2$) 

dz, 


α + 2(1—2,) 


(% &z,«1) 


(l<a2,<l+a) 


32 The πανε significance of canonical correlations 
This is orosided T> Q; fOr xta τα, we have _ 
p(z, |23) = day. (~<a, <1+a) , 


Using t the terminology of Neyman and Peárson, we shall denote the power of the, 
test derived from σι by P; and for that from σι] z} by P’. Then 


azp-[" "Gnade 


; z1—e—a4(1— 
For 1- P, we TM first of all, for given αχ, an NT 


S —e(1—243) 
Jj: P(X, | Ta), 
Ly 


` whiehgives -~ - | d-e1—-2,)—a, (z4«a) 
l 2(1—e) (1— za) 
aalen ρα 
Since p(z, | α) is given by : ᾿ ` / 
- {a+2(1—z)}dag, (4«az,«1) 
das, . (0xz,««) 


we finally obtain, after averaging over Ta, 
1 -- Ρ' = (1— €) (1— a) — fate. 
Before comparing P with P', we may remember that. we m not expect. either x, 
or x, | 2ᾳ to provide the most powerful test obtainable. Theoretically we can.see 
what this test would be by considering the ratio rl 
° Ῥίαν % | α)/ρίαι, σε | 0) = 
84y, ouek since the value of X, is eee: ΚΝ the true value of a is 
specified, it should be realized that X, does not provide us with any actual test, 
only with a theoretical upper limit for P or P’. 
The criterion X, has the distribution 


X,-o0 1 OO 
p(Z,la)=a (1-α} aü-a) 
p(X,|0)=0 (l—a} 2a—o - 
` For (1— a)? >e, we shall allow the value X, = 1 to be significant in e/(1— α)λοξ the 
times that the value 1 occurs; if (1 — ju « e, we allow X, = 0 to be significant in the 
fraction ; . e~n (l~a)? 
"E T αἲ > 
| of the times that X, = 0 occurs. The power P” of a test that could be based on x, 
is then a+e, [(1—a)} >e], i 


δι 


ΚΑΤΕ "Ex. α(1--α), [(1—aP xe]. 


Μ. 8. BARTLETT 33 


Comparative values of P, P' and P" are given in Table 1 for e = 0-05 and.0-10. ` 
f Table 1 





























It will be seen that p(x,) provides a test in this problèm rather more powerful 
than p(x, | ας), but that the latter is quite effective. We cannot of course transfer 
this result to our main problem, but it is clear that p(l} | I$, ..., 12) may justifiably: 
be considered, at least until the distribution p(l?) has been tabulated. - 


4, Returning then to this distribution, we may examine one or two special 
cases before formally noting the significance level of /j in general. It has been 
shown by Fisher and Hsu that for v large, the distribution of 12, B, ..., I} tends to | 


G(m3, mj, ..., m$) dm3dmj...dm3, 


where mi = pi, G- C' ᾖ (miende Ip mimi. 
4 


and | ye - ILU (i-i DIR - i0) a (2) 


For the particular case p = 2, q = 3, the distribution of m? | mi is 
(má ier mà) eam t^m, dm}, 

which is a function simply of m$ — m3. Ifalternatively we consider the distribution 
p(m}), we obtain 26-7: fe-m" — (1 —m3)} dm?, 
the 0-05 significance level for which is 5-37. From p(m}| m3) this value of 5-37 
corresponds to a level 0-030 if m3 = 0, to 0-045 if m3 is equal to its expected value 
0:60, and to 0-05 when mi reaches the value 0-63. These results merely illustrate 
how the significance level of m? depends on which distribution is being used. 

For the case p = 3, q = 4, the significance level for m3 can be written 

de u 
ή ο τη, 

where u = mi—mj, v = m$— mj. The level of significance thus depends mainly on 
u, a3 we should expect, but the effect of v is not negligible. The factor multiplying 
the exponential varies, for example, when u = 4, from 104 for v = 1 to 13 for 
v= 0. f 


Biometrika xxxii : i , 3 


34 — The statistical significance of canonical correlations 
The general expression for the significance level for må is 


eo p 
| (mia ο πι’ J] (mj — m4) dm} 
πμ 1m3 ; 


eo p 
[ (πιϑγία-ν-Ὁ) e-m" TI] (må — mi) dmi 
mh? 1-2 


΄ 


or, if we write D(a) = Í g etde, . 
by a 
Tapta ρε ο 2-8) 





ο ροκ ο) 
; (3) 
For the more general case of finite v, we have similarly for 23, 


[ea ipie frati an 


2 





[aea - eso fp ar- man 


or, if Βία, P) = ip xt (124 dz, 
by 
B, dlp + q+ 1,3 —p—q+1))— -{ $ als, (ip +9- 1 - p- e 1) 


Ba(Mp--q-- 1i»-p-q41]- [Enn (Bp--q— 1, M»—p—q- 1] +... 


The dependence of results (3) and (4) not only on v, p and q, but also on the 
particular values of I$, ..., 1$, makes it impracticable to tabulate the 0-06 or other 
levels of significance; but it is not difficult in any instance to find the exact level 
from (3) or (4), using the published tables of I’,(a) or B,(a, f).* 

It must be recognized that if the second root A? is also different from zero,the 

' distribution of [? for given ἶξ is quite irrelevant, but except possibly when p is 
rathér large, it is probable that two or more non-zero roots would be detected by 
the A criterion, and the testing of A? alone by means of /? (a test which is still not 
completely efficient) would not arise. 


- (4) 


5. Directly we have established the existence of at least one root A2, we may 
always proceed to eliminate this correlation A, and the corresponding pair of 
canonical variates; and analyse the remainder. The theory of eliminating from 
X4 & set of specified variates represented, say, by the vector variate x, has been 

+ Tables of the Incomplete I-function, ed. K. Pearson (1922, His Majesty’s Stationery Office, 


London); Tables of the Incomplete Beta-function, ed. K. Pearson (1934, Biometrika Office, Univer- 
sity College, London). 


* 


M. S. BARTLETT " 35 


indicated by Bartlett (1939).* As a particular case, Χρ may be a hypothetical set 
of r canonical variates of x,, and the criterion A(v—r, p —7,q) for the remaining 
variate Χρ, in place of the original criterion A(v, p, q) for Χα, would test the good- 
ness of fit of the hypothetical vector canonical variate Χρ. In the case q = 1, we 
have the goodness of fit of a hypothetical discriminant function, the problem of 
which was first raised by Fisher (1938). 

It has, however, also been pointed out (Bartlett, 1938, p. 39) that if the canoni- 
cal vector variate x, has been estimated from the data, the symmetrical relation 
between X, and x, will imply that each has only p—r and g—r independent 
components remaining, the x? approximation for the criterion 

= Ik (0-2) 

i=r+1 1 

being — —((v—7)—[(p—7)-- (q—7)* 1]log 4 — —(v— (pq I} log A’, 
with (p —r) (g—7r) degrees of freedom. It was stressed that this reduction of the 
degrees of freedom essentially depends on the existence of non-zero roots 
A3, ..., A2, so that the vector variate Χρ is well-determined, and any effect of- 
selection of 11, :.., from 12, ..., 2 can be neglected. Under the same conditions, 
we may approximately use the tests known for p = 1 or 2, for the criterion 
A'(v—r,p—r,q—7), when p—r = 1 or 2. 


6. To demonstrate the reduction in degrees of freedom-in the case r = 1, 
consider the case when v is large, and 


— vlog A> vlà-» χ. 
iml 


If vi3 = 0,, the determinantal equation for 0; is of the form 
[4—-0V|-— 0, 


where V denotes the variance matrix of x,, and A is a matrix of the sums of squares 

. and products among the p variates of x, for that portion of the sample separated 
off in terms of the independent veotor variate x,. Without loss of generality, we 
shall suppose that V — 1. 

Regarding the v observations for any variate as a veotor with v orthogonal 
components, let us now add to the chance variation of the first variate of x, a part 
dependent on each of the q (orthogonal) variates of Χι. For each variate of X, 
the length of the vector representing the first variate of x, will then receive an 
addition X,, say, (k=1...q), which will be of order ./v. Partitioning off the first 
variate of Xy we obtain, as our new equation for 6, » 

0414 25r X ΣΑΒ. 6 [Gu + Xa X 


04 T Xx X I αμ--θ 


* See equation (2.8) of the paper cited, and the immediately preceding equation. 


36 : The statistical significance of canonical correlations 
The summation sign is for the q degrees of freedom of Χι, and αρ = Xx,%,, where 
2, ..., 2, GTO the p variates of Xy. Solving the equation for the latgest root, we have 


: Qu Xy 1 J 
9 
ο O, = £X? 225, X +a + 2 $x tox) 





^e 


, If we neglect the last term, the sum of the remaining roots becomes 


ει = P (δα. X) 
Σθι- θι ud “the ΣΧ» UA 





which is a x? with (p — 1) (q — 1) degrees of freedom. 


7. To illustrate the use of this test we may consider the data from Kélley 
quoted by Hotelling (1936), these consisting of correlations among tests in (1) read- 
ing speed, (2) reading power, (3) arithmetic speed and (4) arithmetic power, the 
sample being one of 140 seventh-grade school children. Hotelling, investigating 
the Eaton of arithmetical with reading abilities, found canonical correlations . 


l = 0-3945, 1, = 0-0688. * a 
Since-» = 139, p —.2, q = 2, the first correlation gives a contribution to x? of 
— (139 — 4(2 4- 2 4- 1)) log (1 — 0-3945?) — 23-09. 


Similarly the contribution from /, is 0-64. The y analysis is consequently sum- 


marized as in Table 2. 


Table 2 














It is evident at once, as Hotelling concluded from other tests, that there is a 
significant relation between arithmetical and reading abilities, which arises 
entirely from the first canonical correlation. 


M. S. BARgTLETT 37 


REFERENCES 


BARTLETT, M. S. (1934). “The vector representation of a sample." Proc. Camb. Phil. Soc. 
30, 327-40. 
—— (1938). “Further aspects of the theory of multiple regression." Proc. Camb. Phil. 
Soc. 34, 33—40. 4 : 
— — (1939). “A note on tests of significance in multivariate analysis." Proc. Camb. Phil. 
.Soc. 35, 180-5. i - ] 
FisuEn, R. A. (1938). “The statistical utilization of multiple measurements.” Ann. Eugen. 
8, 376-86. : 
------ (1939). '"Phe sampling distribution of some statistics obtained from non-linear 
equations.” Ann. Eugen. 9, 238-49. i 
Ωτκβσπισκ, M. A. (1939). “On the sampling theory of the roots of determinantal equations,” 
. Ann. Math. Statist. 10, 203-24. 
HorzgrnnmG, H. (1936). “Relations between two sets of variates.” Biometrika, 28, 321-77. 
Hsv, P. L. (1939). “On the distribution of roots of certain determinantal equations." Ann. 
Eugen. 9, 260-8. ` 
Roy, S. N. (1939). “‘p-statistics or some generalizations in analysis of variance appropriate 
to multivariate problems." Sankhyd, 4, 381-06. 
Wis, $. S. (1932). “Certain generalizations in the analysis of variance." Btomeirika, 


24, 471-04. 


ux 


ON THE LIMITING DISTRIBUTION OF THE CANONICAL 
CORRELATIONS 


x $ By P. L. HSU 


1. The purpose of this paper is to deduce the limiting distribution of Hotel- 
ling’s canonical correlations* under the most general assumption on the popula- 
tion canonical correlations. The result is stated in the theorem at the end of this 
paper. 

The method employed here is essentially ρα same as that by which we derived 
(Hsu, 1940) the limiting distribution of Fisher’s discriminating components. 
In what follows, steps in the derivation are given while strictly rigorous reasoning 
is left out. The latter may be found in the author’s 1940 paper. 

The parent distribution is represented by the density 


sonst,exp [-4(, X Σ ut +23, Σ atat E rar} (D<q), ----.. (1) 


where B y = X Vas by = “πὶ Σ Yo Con = Σ 1 AUN EE (2) 


By virtue of Hoteling s reduction, the matrix of variances and covariances is 
taken to be, 


Mp ss pp Bor = Bog |] pL 0 1... 0... 0 (3) 
Pu Bor Yu γα] pee 
T reset 0 Pp 0 1 0 
. Rive Boo Vanes Ἴδη E νο 
i 40..0,0..0 ..1 
where pi, ..., e ‘are the population canonical correlations. The sample canonical 
correlations, Τι, ..., p, &re the positive roots of the equationt 
γαι Taip by big 
ο ος RES 
"Sml "amp Uy ων ως (4) 
by boi TO Cig 
big bos Lon τοις 


κ Hotelling (1936). For further work on the distribution of the canonical correlations, 
see Madow (1938), Girschick (1939) and Hsu (1939). 
+ We use r in (4) instead of —r as-in Hotelling’s original definition because it is 


known that the non- vanishing roots of (4) form pairs each of whioh have the same 
absolute value but vu sud signs. 


We set PL = = pa = Py 























NE (uM IC ---Ὁ) 
Phare. tp atl Eo = Pe = Pps | 
Bm mme (6) 
ρι5ροΣ...Σρ50, lg ete (7) 
TERT S p ting (8) 
and proceed to find the limiting distribution of r,, ..., Τρ as n — oo. 
2. Lamma 1. We have the identity 
pu: . 
RS -|S|.|P-QSA2R|. | | ... (9) 
This results from the identity y 
κ sll -s ric] o 8] 
.R S -SR I ο S 
on taking determinants on both sides. 
Lemma 2. Let 
| ἄμ = N+ AN Uys, x Ἔνι, (2) 
bu πρι nva =n, (gp o «...... (10) 


05 = hb Ang, e (g+h), 
(4j =1,...,D3 9,4 = 1, ... 4). 
The distribution of the ws, v's and ws approach that of 4(p+¢)(p+q4+1) normal 
variates whose means are zero and whose second moments are specified in the following 
statements: 
(i) any v or w which has at fi one suffix number >p is uncorrelated with all 
the others; 
(ii) any member of one of the sets ΠΝ Όρο Wa), (6— 1, .. 0), (1,9, Vjt Wy) 
(3, 45 1, ..., p; £273) s uncorrelated with all the members'of all the other sets; 
(iii) ντ noe hace c 
E (ui) = (wi) = 2, 
43) = 1+ pf, 
G (Un Wy) = 202, ; 
d(uava)- OW) = 2, po s (11) 
δω) = δος) = EW) =1 G3 
Z(u, Wy) = δίυμυγ) = Pips (3), 
δίωμυῃ) = δίωμυμ) = Py (457). } 


. 


40 - ` Distribution of the canonical correlations 


This follows minedas from the well-known central limit theorem.* The 
second moments may easily be computed by virtue of (3). 


COROLLARY. Under the assumption (6) any u, v or w which has at least one suffix 
number > 8 $8 uncorrelated with all the others, and &(v},) = 1 fort = 8+1, ..., p. 
This results from (11) on putting p; = 0 for i = 84-1, ..., p. 


3. We may now find the limiting distribution of r, ,, ...,r, the p — s smallest 
correlations. 

We substitute (10) in (4) and then divis. each element by n. There resulta 
the equation 





{1 em) μη ibe Mig ga MMe Tue vag 
Ni Jn. E Jn iom An RA in| 
Tus. Ms TU, 51 TUsp jn 7 q Ves Vol Vog 
Jh (1478) Jn Jh Jc ΠΝ n Jn 
Τρια TU s sese Τάμα Όρμα - Vatis Όραμα Yetta 
Ju ss Jn (1+ u 4 e m δρ 5 iE com 


ο ΣΕ ΣΣ ΑΕ... 
5 




















va Usi 1 Wy TU, TW stl Tw 
PU dn hoa) m uw oR 
Yis / m MIT Ups T1051 1. TWe H1 τοις 
jm ΠΤ. m jn Jn Jm) an Jn 
Piatt MD 1 δ μι Upati Wa . τόμ. 8+1,8+1 710814 
n Ar n jn. ln © 4n iU i ) n 
πο πω. ο ο ο.” 
Pig v Vorig v TW Τεῦσε Tol Weg 
km 7 dn Jn Ὃν Ἢ “Λη 
EL = Sue (12) 


The equation (12) has p—s roots which are o(1) for large n. To evaluate these, we 
substitute n-* 7 for r in (12), delete the common factor n? from the rows ᾿ 


D 


84 1,54-2,..., 9, p-F 8 15 p 8-2, --.Ὁ PQ, 


* Cf. Cramér (1937), pp. 118-14. 


l P. L. Hsu. : © 41 
and then let & —- oo. There results the equation 
0 ο ὦ 0 pt 0 ο 0 
0 0 0 ο Ὁ: ps, 0 0 
0 ϱ ` =y O Osp o Vetla Yaphet + Vota 
0 0 0 —7 vy Όρο — Üpeul Vog e 
pi 0 ο. ο 0 0 o |75 
0 ρ- 0 ο ο 0 0 
rest ++ Όφω Όραμα ' Sys O 0 m 0 
Vig Yq Varia ve 07 0 g^ -ῃ 
i.e. =] 50. 0. μια ce Usi 
0 5 ο v, 
νων ρα |..0,, ......(18) 
ΟΕ Vol 7 0 
Verra =e Upa 0 =7 : 
By (9) the left-hand side of (13) is equal to 
ν ἈΡΝῚ T 7? . s deria 
-. (g| IR TT AVES a “Φμήη (14) 
άρει Goq— 1" 
7 q ` 
where άρτ X Vg (5j—-8tL.eap. 0 00 66e (15) - 
3 9841 


Let 55,4, ..., 65, in descending order of magnitude, be the latent roots of the 
matrix | ἀμ ||. Then the p—s roots of (12) which are o(1) for large n are : 


natto) (¢=8+1,...,p). 
If we define €,,, ..., 6, by putting 
ΚΕ rant} (ἑ-α-],..,»)). 2 e (10) 


then the £s have the same limiting distribution as the ¢”s. Hence the limiting 
distribution of the ζ, may be derived as the distribution of the latent roots of 
|| ἆῳ ||, in which the v'$ are regarded as having a distribution which is the limiting 
- distribution described in Lemma 2. By virtue of the Corollary this is the dis- 
tribution of (p— 8)(-- ϐ) mutually independerit normal variates.with zero mean 


^ 


49 Distribution of the canonical correlations 


and unit standard deviation. Therefore* the limiting distribution of the ¢; = nr? 
has the density function - 


A 


zoania T rj -e- in ra] | Π I (&-£) 


i—541 1541 


x( il ο". exp (-1 E &). resets (17) 


184-1 
οοΣζμι2...2ζφ20. 


The transformation ¢, = 7? gives the following density for the limiting 'dis- . 
tribution of the y; = nir; (t=8+1,..., p): 


πο οσους (T rai +1) ry 
¥ $—1 
p p D σ-Ώ Dn 
«| H Hn et fl ην) exp(—4 η). es (18) 
1-801 j=t+1 {Ξ-6--1 i-854-1 


007,42... 27,520. 


4. We now proceed to find the limiting distribution of rj, ...,r,. By virtue 
of (9) we may write the left-hand side of (4) as 


r |C|. | rA — 8018’, 

















where 
απ e Qip by δις ei Cig 
πμ ον p Bel e .-Ο-]-........... see (19) 
api ... Upp | ὄρι --- Ong |] Ca Cop 
Hence, if we set Ope ο pl, es (20) 


the 0, are the roots of the equation 
|BC3B/-4A|]- 0. ^ n (21) 
Substituting (10) in (21) and dividing each element by-n, we get 
ὶ | (A+n-7V) (I-- 3W)3 (A’+27V’)— θ(11-π-10) | = 0, ...... (22) 






































where $ 
Uny typ Vy Vig 12011 . Wig 
esse^ | μου... QM RII ee ,. Waf ceecee ο. (29) 
Up, -.. Upp Up = Ὅρα Way νε. Wa 
pi 0 0..0 
Ael exei. Seely. «> SA (24) 
0 p, 0 0 














* Hau (1939), pp. 256-7. 


P. L. Hsu 43 


Neglecting higher powers of n we write I—-n-*W for (I--n-1W)-* and carry 
out the matrix multiplication to the term with n~ as a factor. There results the 
equation’ l 
| AA’ +2-(VA' + AV’ —-AWA’)—6(14240)| = 0, ......(26) 
i.e. E 
py. —Ó- n (2p, vp Wy — 08) — noit. pats — Pi Pasa — tiyg) 
n Ó(pi tp, + D Yip — POS tip — Map) 
T 3(pi Va Qaia — Dips Was — Our) Pè — Ó η ρε gg — Dy Wag — Oupa) 
D (ρε Vpr + Pp Map — PaPp Wep — Map) 


eee eRe CeCe eee cere eee eee ee ee CeCe ΝΗ eee See eT ee eee eee eee eT eee Tee ee eee eee eee eee eee ee eee eee) 





πι ριυρι zu Po Vip di P305 Wip A Qu) NI Pa Vpa tp, Van Te PoP» Won T Qus) - 
fy — 0 0520, vap Dy Ung — θε» 
^om (. 


On account of (5) there are μι roots of (26) which are p?+0(1) for large n. To 
evaluate these we substitute p?+n~1p,¢ for 0 in (26). Since the first μι of the 
p"8 are equal to p,, there will be a common factor n- in each of the first μη rows. 
After deleting this factor and then letting n->œ, we get the equation 


2044 — Py(Uy + Wy) - C Via + 7; — Ρι(ύηᾳ + Wia) 
| Vim t Ug Pita, t Wip) 
Vig + Var — Di (4455 + 1019) 255 — P1(Ugg + sg) — E 


A Vam + Ya P ity, + Woy.) 
Vim + T4 7 Pray, + Wry.) Vay, + Ppa Pig, + μι) -.. 
j Wari PU paps ru) — 6 





=O, 0 5 x— (27) 
ie. 41-65 Zip 
νεα tene pm 0, sd o8) 
3 Zu , Zam E 
where , . Zig ο Twg) (njelL..gm ...... (29) 


Let 61,..., Ea, be the roots, in descending order of magnitude, of (28). Then tne 
μι roots of (26) which are p?+ o(1) for large n are 


pi np, C c o(n3) ($1, ..., 24). 
Tf we define ζ,, ..., E, by putting 
0, = pit n3p, (6 —1, ss), ote (30) 


then the £'s have the same limiting distribution as the ¢’’s. Hence the limiting 
distribution of the 6, may be derived as the distribution of the latent roots of , 


44 Distribution of the canonical correlations 
the matrix ||z,, Ibi in which the vs, v's and w’s are regarded as having a distribution 
which is the limiting distribution described in Lemma 2. 
Now, if all the w’s, v’s and w’s are normal variates with zero mean, so are the 
ΠΟ Owing to the fact (ii) of Lemma 2 all the z’s are uncorrelated. Using the 
formulae (11) to calculate the variances of the 23, we easily obtain 


Eli) = 4(1—p3)*, 4G)-20—p)* GJ)» ee (31) 
i (& 4 9 L ...,44). 

Hence Za —20—-p)üs 9;—-420-p)t (FI), —— eee (32) 
(7 =1,. 34h), 


- where the f's are mutually independent normal variates with zero mean and unit 
standard deviation. 
Setting ἕ = 2(1— p) in (28), we get, by (32), 


iu —7 2-1 one 21Η, 


ΣἩμ ee 2. (83) 
230,3 270 e bum 
Let πι... 7, be the roots of (33) in descending order of magnitude. 
The density function 
(277) ts) exp ( — 18. Hoti a να Su) es (34) 
is equal to | (20) Haat) exp ( --ᾷ 5 v) 3^ 7 7 4$ (36) 
451 


which is a function of the latent roots only. Hence* the distribution of the latent 
roots has the density 


fnt, sty) = atm C FE ra) “(LEH eh- Eno). ο 0) 


i=l jmt+ 1 
O> M>... Fy, > —00. 


The density function (36) represents the limiting distribution of Ni 5s Ney Where 


6, = pi+2n4tp(1—pi)a, @=1,- pa)  .-.-. (37) 
Hence - r= Of=p,+n4(1—p?) yj +o(n4) (ἐ--1,..., μι). 
If we define 7,, ...,7,, by 
᾿ fi = yen pi) N 6—1..,5h  »..... (38) 


then the 7, = ni(1— p?) (r,— pı) have the same limiting distribution as the 7. 
Hence the limiting distribution of 7,, ...,7,, has the density f(,, ...,5),,). 
In exactly the sume manner we may prove that for k — 1,...,v the 
= m1 — pp) (r; —p,) ($= ete teat... te tm) 
* Hau (1939), p. 256, Theorem 2. 


P. L. Hsu ᾿ 45 
have the limiting distribution represented by the density 


FN prt tae tl) sess y PERSE E 


- 


where, in general, 
m : -ifm m ©- m Y ? 
fi sang) = 2mm TET) IE TE epep -4 3a), --- 0) 
{πι i=l jii 4-1 
j ο0» 4} 2... 2%, > — 00. 


Furthermore, the sets (n; -- x ο μον (PETER . + PE e a... «Έδρα ελ. ης) 
are such that the equations λαο μιας to (27) for WO different sets involve 
only mutually uncorrelated w’s, v’s and w's owing to (ii) of Lemma 2. Therefore 
the limiting distribution must be such that these sets are independent of one 
another. Also, recalling (14) and (i) of Lemma 2, it is seen that the limiting . 
distribution of %4, ..., Τρ must be such that the sets (7, .... Ώχι), Mut Iu τν 
Lena tb 5713) and (7,4 +++) ρ) are independent of one another. , 

In conclusion we sum up the results in the following theorem: 


THEOREM. Let the population canonical correlations be pl, ++) pp Where 


’ ΠΗ ΄ ux 
. Py = +++ = Pa, = py 
7 / =e 
Pai 07 Party = Pw 
i / EM — € 
g Duy... Εν. ια = t8 = Ps = Dy 
/ + 
Ρο = + = Pp = 0, 


£1? fa? -> p,» O0. 
Let the sample canonical correlations be ry, ..., T5, where - 
Ta 2T Ze Aly 
Let κ N; = ni = pg)? (r— pi) NU 
Then the limiting distribution of 7, ..., Np 18 represented by the density function 
fs ten Way) S Oat ssx Tartan) s SO PET ος saeg Ne) Pin, etg 1p)» 
where the functions f and f, are given by (39) and (18) respectively.’ 


REFERENCES 


- CRAMÉR, E. (1937). Random Variables and Probability Distributions. Camb. Univ. Press. 
Grmsoutox, M. A. (1939). “On. the sampling theory of roots of determmantal equations.” 


. Ann. Math. Statist. 10, 206. 
Ἡοπππαακα, H. (1996). “Relations between two sets of variates.” Biometrika, 28, 321-77 
Hsu, P. L. (1939). “On the distribution of roote of certain determinantel equations." 
Ann. Eugen., Lond., 9, 260-8. 
—— — (1840). “On the limiting distribution of roota of a determinantal equation.” Proc. 


Lond. Math. Soc. (at press). 
Mapow, W. G. (1938). “Contributions to the theory of multi-variate statistical analysis.” 


Trans, Amer. Math. Soc. 44, 454. 


“ΤΗΕ APPLICATION OF MAXIMUM LIKELIHOOD TO 
DOSAGE-MORTALITY CURVES 


By F. GARWOOD, Pz.D. 


1. IwTRODUOTION 
Many papers have been written on the fitting of dosage-mortality curves; in: 
particular, a paper by Irwin & Cheeseman (1939) summarizes the methods 
which have been adopted hitherto. It is felt, however, that there is some mathe- 
matical interest in the subject which is worth emphasizing. 

A typical problem occurs when studying the effect of some drug on a particular 
kind of animal. It is assumed that there is a population of animals, and associated 
with each individual animal is a certain lethal dose of the drug, such that the 
animal would always be killed by a stronger dose and would survive a weaker one. 
There is independent biological evidence for assuming that the logarithms of the 
lethal doses are normally distributed throughout the population, so that if the 
proportion of animals expected to survive a given dose is converted into a probit 
(i.e. an equivalent normal deviate + 5), then the above assumption is equivalent 
to stating that the probits are linearly related to the logs of the doses. If the mean 
(or median) log lethal dose is m and the standard deviation is o, then the linear 

- relation between probit'and log lethal dose is 


` Y -α--βο, 
“where σ- and m = ——. 
M β f 


The experimental material consiste of k groups, drawn at random from the 
population, of πη, ng, ..., 1, animals, which are given doses with logs £4, Ba, ...,%,, 
from which there are 61, 8, ..., 8, Survivors, and n, — 8), Teg — 8s, ..., Τικ — 3, deaths. 

The treatment which has hitherto been applied to data ofthis kind consists of 
obtaining from tables the probits y,,y,,... corresponding to the proportions of 
survivors q, = 61/11, ds = ϑα/πᾳ,..., plotting the y’s against the corresponding log 
, doses 2,25, ... and fitting a line to the points, bearing in mind the following 
considerations. | 

Since dg|dy = — Z, Z being the ordinate of the normal curve, and as the 
variance of q is PQ/n, Q(— 1— P) being the expected proportion of survivors, it 
follows that the variance of y is PQ/nZ* which in general varies along the line. 
Thus different weights must be used for the various probite in fitting the line. The 
effecta of using different methods of calculating the weighting coefficient 
w = nZ*|PQ (reciprocal of the variance) have been compared by Irwin & 

-Cheeseman (1939). 5 


F. Garwoop . 47 


A further difficulty occurred in the cases of zero and all survivors, for which 
the corresponding probits are infinite. Fisher’s method (Bliss, 1935), using the 
method of maximum likelihood, overcame this difficulty by replacing the infinite 

_probit by a working or fictitious probit in the regression equations. s 

Mathematically, Fisher's exact method of caloulation is as follows (see also 

Bliss, 1938 and Fisher & Yates, 1938). Assume rough values a, and b, in the 


relation Y=a, + bx.” 
For each value of x this formula gives Pand Q (the areas of the normal curve 


up to and beyond the probit Y), Z (the ordinate at Y) and w = nZ?/PQ. The 
regression is then found between the variate 


: y= Y+ =m tbat] acces (1) 
` E Q --ᾳ 

where s 7 — Z 3 

and z, giving weights w to the former. The result is a new regression equation 
Y = a+ baT, 
_ Swa(y-) 

where b, = Bw(z —h* 

and ας = J —bgx. 


It is to be noted that this form of the regression equation is more convenient 
for our purposes than the form Y — Αα π). Substituting the values of y 
from (1), it follows that ] 

l _ p, , S227) 
bh Sw(x — 7)? 
Hence the changes da = a,—a, and ôb = b,—b,'in the regression coefficients of 
y = Y +y on q are in fact the regression coefficients of 7 on x, and they can be 
regarded as the solutions of the normal equations 


δαδιο +db8we =Swy, , 4... (2) 
6a wz + óbSwz? = Swen. Qo. ases (3) 


The new regression equation Y = a+b% is then made the basis of a similar 
calculation; i.e. the regression is calculated between a, +b% -- η and x (the values 
of will be changed since Q and Z are in general altered), giving another equation 
Y = a +b% and so on. The process is continued until no change occurs in the 
coefficients. 

It is possible that some arithmetical labour might be saved by obtaining 
the corrections ĝa, ôb to the regression coefficients a,b at each stage, instead of 
the new coefficients a + δα, b + ôb, by calculating the regression between 7 and x, 
but this has not been investigated. The process of obtaining the corrections is 
illustrated later (Table III). — 


and da= 344-7 — (b, — bi). 


48 Maximum likelihood and dosage-mortality curves 


In general the successive coefficients a,, and b, will converge to limits (and the 
successive corrections δα, ôb to zero), and it is not difficult to see that these limits 
are the solutions of the maximum likelihood equations. The process is, in fact, the 
same as-the general method suggested by Fisher for solving maximum likelihood 
equations, as will be shown later. Also, a consideration of the foundations.of this 
method shows that there is a method of obtaining the maximum likelihood 
estimatés which is slightly more rapid (as regards numbers of approximations) 
than that outlined above. 

It is one of the objects of this note to point out that the problem may be 
regarded as one of estimating the parameters m and c (or equally, α and £), 
ie. of fitting a normal curve to the data. The fact that this is equivalent 
to fitting a line to the theoretical relation between probits and log lethal doses is" 
only a consequence of the special nature of the normal distribution. It may happen 
in other applications that the distribution of the log lethal doses is not normal and 
cannot be normalized by any transformation of the log lethal doses but has 
another form depending on unknown parameters; then the problem can only be 
regarded in general as the estimation of parameters from observations. . 

In the case of the normal distribution the method of plotting probits is of 
course 8 very convenient method of representing the data and of obtaining a good | 
general picture, but it is to be emphasized that from the theoretical viewpoint it is 
at least equally important to interpret the problem as one of estimating parameters 
as to regard it as a problem of fitting a regression line in the ordinary sense of the 
term. 


2. GENERAL MAXIMUM LIKELIHOOD ESTIMATES 


It will be convenient to recapitulate the method used by Fisher for solving 
maximum likelihood equations (see, e.g., Koshal, 1933). A sample 2,29, ...,%, 
is drawn at random from a population of which the frequency function has a 
known form depending on s unknown parameters θη, Ôa, ..., 0, 80 that the proba- 
bility of obtaining the sample is 


P(t, basee ἕω; 050, 5 0,). 


If the variates x are independent of each other, as in the case of successive samples 
from the same population, then P is the product of n probability functions 
f (ay; 04,04, ...), f (25; Fy, Og, ...), .... On the other hand, they will not be inde- 
pendent if they are a set of frequencies with a fixed total. 

The maximum likelihood estimates of the unknown parameters 6,, θα, ... 
based on the information provided by the sample, are the values which satisfy 
the maximum likelihood equations 0.P/00, = 0, dP/00, = 0,.... Using the likeli- 


hood function L = L(t, ty «05: 0,0,,...) = log P, 
the estimates must be solutions of 01/06, = 0, ete. 


F. GARWOOD 49 


Since they are functions of the sample; these solutions can be written 
T, = T,(x,, Ta ...), Th, ..., etc. ; if approximations t,t, ... to Τι, Ty, ... are found by 
some rough method, suppose that T, = ¢,+ δίῃ, etc., so that dt, δές, .:. are the 
errors in the approximations. Then we have > $ 


oL 
0- 38, ἄν T» e) 


91, aL 931, ον 
΄ 7 8; Cr ἐν...) + δὲν ggg fs) + Bla 5555. ον ἐν EN seta (4) 


eto., ignoring terms in δές, etc. Writing 
l -oL 
90, (fas |) = L, 


BL \ 
308 (ti ta -..) = Ia: eto., : 


the equations become, Ly, δὲι + Lag big + ... = — L, ete. 
or δα πα (8) 
4-1 


‘The solutions of these equations can be written 
8 
ot, =e Ely 1 (i = 1,...,8); 
j=l 


where, in matrix notation, (Lj) = {Lu}, the reciprocal matrix of {Z,,}. Thus if 
A is the determinant formed by the elements Ly, and 4,; the determinant obtained 
by omitting row { and column j, then 


1,248 7,242 7, -B ] “om, 
1177 A 2 12 77 4 , 13 ~ A 5 445» i T A . 
For two variables I 
a hs = In - 8, ha =- om, A = balas Da 
~and = — (LLa thas), δίς = = (laL ls La). 


The corrections ôt e not be exact, since terms of higher order have been omitted 
from (4); however, if the process is repeated with t + δέχ, ta + δέᾳ, ..., now used as 
the first approximations, the corrections obtained will be of the next order of 
smallness, and the process can be continued until the approximations t are as 
near as desired to the exact estimates 7’. 

The coefficients L;, are functions of the observations as well as the approxima 
tions t; it is in some practical cases more convenient to replace the z's by their 
expected values, i.e. the values which they would be ou. to have if 0, = ἐπ; 

Biometrika xxxii X 4 


50 Maximum likelihood and dosage-mortality curves 
6, = t,.... Let Λι, be the resulting values of L,;, 8ο that equations (5) become 


8 
PAH, meg (î=1,...,8) 
: 8 
with solutions δὲς = — EM, (i= lyss) 


where ^ αλ = tar. 
As before, the corrected values t + δέ can be used as the basis of the new calculation 
&nd the process repeated to any desired accuracy. It is to be expected that the 
replacement of L; by its expected value A,, will result in the approximation being 
less rapid; this is confirmed in particular examples by paloulanions given below, 
the difference being small, however. 

The method has been used by Koshal (1933, 1939) in fitting a Pearson Type I 


τ. (008 Y= ye-a} (B xy, 

to a set of frequency data. First approximations to the maximum likelihood 
estimates of the four unknown parameters a, β, μι» μα (Or 4, 0,, O3, 0,) were found ᾿ 
by actually calculating a set of values of L and estimating its maximum position. 
The above method was then applied. In this case, if a typical group frequency is n 
and the expected proportion is p, we have, S denoting summation over the groups, 


L = constant + Snlogp, L, = ST Pis 


n n 
and δα δαν ad 


As before, p, denotes 9/96, and p,; denotes 0°p/00,00,. The i ο value of the 
last term is : 


where N = Sn = total frequency, 80 "hs 


΄ 


PiP; 
L-—NS-—-, 
Ay 5 


which were the values used by Koshal. 

The covariance matrix* of the estimates T is approximately equal ta {A,,} 
(Fisher, 1922); thus the variance of T; is approximately --λιι and the covariance 
of Τι and T, is approximately —A,,. The degree of approximation is such that 
terms of a higher order in 1/n are omitted. — . : 

To apply the method to dosage-mortality problems, suppose the probability 
of death is a function P of the dose (or log dose) x and of unknown parameters 

* This has been used by Irwin & Cheeseman (1939) to derive the formulae for the 


variances and covariances of the estimates a and b of the parameters in the lethal dose 
distribution. 


F. Gagwoop . 51 


9,, 6, .... The combined probability of the given set of survivors is 





Hia 8)! Du S 
so that L = const. + S[(n— s) log P +8 log Q], 
and ' L, = 8" 9) RB, 


where q—8[n = observed pM of survivors; therefore 


f a 4l 
"and A, —-—8 








If the distribution of lethal-doses (or log lethal doses) depends only on para- 
meters of scale and location, we have 


P = Ρ(α-- fx) = P(Y); 
therefore ` P,- P'(a--fx) - 2, and Py = 2d, 
where Z is the ordinate of the frequency distribution; thus 


kg o) Iu 8 g8- q). 





PQ 
, | nZ? 
Putting Po A g -g)-£ = w) ahd PQ^ w, 
we have ΄ ' D, = 8, L,- δας. 
Also = S0, Dap = Sal’, Lpp = Sarl’, 
: "Z(Q-—q) (z Z 5)- πζ - x 
ie. . us cm PQ Z PQ 8 pg 
= — w+ Õe, ee (6) 
h = Ζ' mee A ` 
where . μ-σ-Ῥ Q- 
Similarly © Digg = — Swat Suh, sae (7) 
Digg = — Swe? μας. sae (8) 
” The expected value of ζ is zero, so that 
z Aaa Ss Sw, Aag = — Swz, Age == Swx?, 


- The equations for the corrections δα, δῦ to approximations a,,b, to the 
maximum likelihood estimates, using the “expected” coefficients A, are thus 
&aSw +8bSwx = Sw, 
daSwaz + dbSwz* = Swan, 
i.e. the same as equations (2) and (3). Thus Fisher’s exact maximum likelihood 


42 


52 Maximum likelihood and dosage-mortality curves 


method of correcting the regression line is exactly equivalent, in the case of any 
distribution defined by parameters of scale and position, to the above method of 
calculating successive approximations to the maximum likelihood solutions. In 
the case of the normal distribution, it should be noted that if a and b are the 
maximum likelihood estimates of α and β, then z = (5—a)/b and 8 = 1/b are the > 
maximum likelihood estimates of the mean and standard deviation m and o> 
since it is easy to see that these values satisfy 0L/dm = 0, 91106 = 0. In other 
words, the problem can equally be regarded as the estimation of the population 
mean and standard deviation from the data. 

Furthermore, it may happen from some peculiarity of the experimental 
material that each experiment consists of one item, i.e. πη = na — ... = 1, and 
the number of survivors is either 0 or 1. For example, the dose night represent 
some quantity which can be measured but not controlled. 

Provided always there is independent evidence on which to base the assump- 
tion of the normal distribution (or of some other known form), there appears to 
be no reason why such data should not be effective for the purpose of drawing 
inferences about the population. ' 

It is true that for the purpose of testing the hypothesis; say of normality, it 
will be necessary to group the data, and this will also be efficacious in obtaining a 
provisional probit line, i.e. first approximations to a and b; but for the exact 
estimation of the population parameters this is unnecessary (and would, in fact, 
result in a loss of information), for there is no difficulty in carrying out the 
calculations given above, or illustrated later in Tables II and III, with the values 
of q equal to 0 or 1. On the other hand, the exact problem cannot be regarded as 
one of fitting & line to the plotted probits, since all the latter are infinite in one 
direction or the other. 

Another convenient way of regarding the ολοι of finding the maximum 
likelihood estimates is a geometrical one. We require to find the values a and b of 
a and f which are such that L, = 0L(a,f)/da and Lg = L(x, β)/9β are zero. 
Regarding a, £, γ 88 cartesian co-ordinates in ο. space, we require 
to find the point P(a,b, 0) where the two surfaces y = La, Y = Lẹ and the plane 
y = 0 meet. This is the same as the point of be of the curve L, = 0 in 
the plane y = 0 (the horizontal plane) and the curve Lp = 0 in the same plane. 
If P, (a4, δι, 0) is an approximation to P, the tangent plane to the first surface at 
the point vertically above P, cuts the horizontal plane in a line which is near the 
first curve. 

Using co- -ordinates δα, ôb with reference to P, as origin, this line TE equation 

da ~ oa θαϑβ 


or in the simpler notation 





= 0, 


L, +801 gq + SbLigg = 0. 


~ t 


,F. Garwoop ^ 53 


Similarly there is a line near the second curve with the equation 

Ly + Sala bL, = 0 
and the intersection .P, of these lines is a closer approximation to P than P,. The 
values of L,,, etc. are given in equations (6), (7) and (8) and the effect of replacing ' 
them by their expected values A,,, etc. is to replace the tangent planes by planes 
through the points of contact differing slightly in direction, and to-replace the 
above lines by lines whose equations are given by (2) and (3). 


3. THE GOODNESS OF FIT TEST 


In the regression line treatment of the problem, the test of the hypothesis of 
normality is provided by testing the residual variance x? about the regression line, ' 
with degrees of freedom two less than the number of groups. From the point of 
view we are considering, it would appear more natural to: calculate x? as in the 
* comparison of observed and expected frequencies, i.e. 





qos d 
nP n A 
n(Q—q)? _ gnZ8/Q-9)? ` - 
= EE esto zi) - se 


with k — 2 degrees of freedom. 
The residual variance about the regression line is 


Sw(y -xéa—26b)* = Swy*— aswy —dbSwen, 
so that the two values of y? are identical when δα = 0, ôb = 0, ie. when the 
maximum likelihood estimates have been approached sufficiently closely. 


4. COMPARISON OF METHODS OF SOLVING MAXIMUM LIKELIHOOD EQUATIONS 
8 
For convenience we refer to the method using equations Σ; {4 δέ, = — L; as 
4551 
method I and that using the expected values of the coefficients, viz. - 


ee ee 
j21 


“as method II. Before comparing them arithmetically, it is of interest to enquire 
whether the two are ever identical, i.e. L;, = /1,,, for a "scale-location" distribu- 
tion. From (6), (7) and (8) this requires that 


Z 2.2 
Now Z=dP/dY = P', so that the probability integral P must satisfy the 
differential equation mE NES. í 5 
- ” $ =0., 





| 94 ` Maximum likelihood and dosage-mortality curves 


An integral of P*-- P?f(P)- 
is | '. Ρας = 6, 
sothat . Ρ’ = CPU -—- P), 
εΏ σσ 
and hence = ligBRO- 


Since Y = a+ βα, we can choose values B = 0, C = 2 for the arbitrary- con- 
stants, giving 

e2Y 
^ iFa 
and ordinate Z = P’ = sech? Y which is the distribution given by Fisher & 
Yates (1938). Thus for this distribution 4 — 0 and Lia = A,,, Lag = αρ, 
L4 = 4455; hence equations (2) and (3) will give the most rapid solution. 

Tables for facilitating these calculations have been given by Fisher & Yates 
(1938); it would appear that Tables XII-XIV of this work supply similar tables : 
for fitting a distribution of the type P = sin? ¢ = sin? (a + fx); here i 

. H = —2 cot 26, w=4n, € =4n(Q—q) cosec 29, 
so that. Laa = —4N —8S8n(Q — q) cot 20 cosec 29, ete., 
and any advantage in rapidity gained by using method I is almost certainly offset 
by the simplicity in method II, since A,, = — 4N, Aap = — 4N ST, A5, = — 4N 82*. 
The point has not been tested, since no examples have come to hand in which 
a P = sin? ϕ distribution has been envisaged. 
To apply method I to the normal distribution we have ordinate 


M 


unb Lp p 
42m 
Z 2 
therefore | p-b5- ir m P 


The two methods of successive approximation have been applied to each of 
the following three sets of data taken more or less at random from those already 
published for illustrative purposes. 

Example (i). Antipneumococcus serum given to five groups of forty mice; 
Wilson Smith's data (Irwin & Cheeseman, 1939, p. 179). 








πι 


Serum dose c c. x Deaths out of 40 





0-000625 -2 33 
0-001256 ` -1 22 
0-0025 0 8 
0-006 1 6 


0-01 2 2 











F. GARWOOD 55 


Since the increased dose resulted in more survivors, we call g the proportion of 
deaths. 

Example (ii). Mice injected with Bact. Typhi-murium, sample A; Topley's 
data (Irwin & Cheeseman, 1939, p. 180). 


\ 























Dose (mg) x Survivors out of 5 

0-0625 —3 4 

0-125 -2 3 

0-25 —1 2 

0-6 0 0 

1-0 1 0 

2-0 2 0 : 
| 7 40 3 0 


Example (iii). Brine shrimps, Artemia salina, in arsenical solutions having 
concentrations in geometrical progression (Fisher & Yates, 1938, p. 5). 








Solution .c Survivors out of 8 





Hoga 
e 
OM Q σι ο o 00 








za 


* Ineach case the scale of z has been chosen with a central origin for convenience. 
The last example was used by Fisher & Yates (1938) to illustrate the use'of a 
table (Table XI) drawn up for solving these problems; as, however, the arith- 
metical work in this note has, for purposes of comparison, been taken to more 
places of decimals than practical work demands, this table has not been used, and 
the requisite areas and ordinates of the normal curve have been taken from tables 
of the normal curve (Pearson, 1930). The results of the comparison are shown in 
Table I. ᾿ 

Most of the values given are probably exact; there may, however, be an occa- 
sional error of one unit in the last place. The first approximation in Example (i) 
is one used by Irwin & Cheeseman; as it has already been calculated from the 
data, the errors in the coefficients are smaller than in Examples (ii) and (iii). 
In Example (ii) the first approximation was found by’ fitting a line by eye to the 
observed probits, and in Example (iii) the first approximation was found by 8 
























56- Maximum likelihood and dosage-mortality curves 
: Table I. Comparison of methods of successive approximation 
to solutions of maximum likelihood equations 

| Approximation Successive corrections to a and b 

—| 
Example : Method II Method I 
First Final — 

δα 4b δα àb 

0-0074 0-007 R 






— 0-0002 








Y=7+e 


(in) | ¥ =4-640-62 











Y =4-5658 + 0-7128% 


— 0-2034 

0-1360 0-0530 
0:0267 0:0127 
0-0011 


0-0001 











0:0891 
0-0215 
0-002T 
0-0001 








Method II. Expected values of 911//8α3, etc. used. 
Method I. Actual values of 2*L/[0a?, etc. used. 


preliminary regression caloulation with the coefficients rounded off to one place 
of decimals. The arithmetical details of the calculations (Example (ii)) of the 2nd 
approximation by the two methods are given for illustration in Tables II and III. 


0:0533 
0-0110 0:0053 
0:0001 0-0001 


— 0:0288 
— 0-0072 
0-0002 









Tt is seen from a comparison of the results that there is a slight advantage, 


from the point of view of rapidity, in method I. On the other hand, method II 
entails a little less arithmetical work, and if Fisher & Yates’ tables are used it is 


probable that this advantage would be my although the point has not been 


investigated. 


΄ 


















Li 





s 


Table IT. Typical calculation of corrections to estimates of 
e parameters, Example (11), method I ο. 
i Ist approximation, Y =7+2 " f 














L1 








oooocwnvwow 
coocooo |: 
55 ὅς αν ος ο 




















` Ly = SC = —0-46228, Γρ: Sxf = 0-53662 | 
Dug = SU = — 1:7629, Lap = Sxl’ = 8:1704, Legg = Sarl’ = — 7-2120 


΄ 


Lac ; = 
δα = Dolap- Tass — 0:6133, b = Tuan Valea _ --0:19δ9 , 
Laa Ligg bug $ Laa Lpp- Laf . 
z a= 7 b= «1 - 
7 a+éa= 60-3807 . 7 ὃ--δὺ- 0-8048 


2nd approximation, Y = 6-3867 + 0-8048z. . Final approximation, Y = 6-5168++ 0-8635z. 


Table ITI. Typical calculation of corrections to estimates of : 
κ. parameters, Example (ti), method II . 
- Ist approximation, Yeol+e 























Swen = 0:5366 ` Swa? = 6-9404 Sw = 16601 Swx = —3-0118 
—&Swy = —0-8387 — Swe = 54640 & = —1-8142 
δωη(ς --ᾱ) = — 0-3021 20 δω = 1-4854 Swy = -0-4623 . 
δὺ -- wes =~ 0-2034 . à | Πτι 0.286 ` 
--ᾱδὺ = — 0-3690 
b= 1 P ja — — 0-6475 
biàb- 0-7966 M NET 








α--δα-- 635625 


2nd approximation, Y s 6:3526---0-7960x. Final approximation, Y = 6-5168 + 0:8635x. 


_ * Five significant figures were used for P, Q and Z, where possible, but only five places 
of decimals are shown in the table; similarly in Table IIT. 

T As n = b in each sample, it has been omitted for convenience from ἕ, ζ' and from 
10, wz in Table III. 5 


58 Maximum likelihood and dosage-mortality curves 


5. SUMMARY 


The usual practical maximum likelihood treatment of dosage-mortality 
problems (consisting of the transformation of percentage surviving into probits, 
adjustment and weighting of the latter, and calculation of successive regression 
lines) is shown to be equivalent to caloulating successive corrections to the re- 
gression coefficients. The process is exactly equivalent to the method, given else- 
where by Fisher, of obtaining the maximum likelihood estimates of the para- 
meters defining the distribution. A refinement of this method, using the actual 
values of the second derivatives of the likelihood function, instead of the expected 
values, converges a little more rapidly'when applied to the normal distribution, | 
but this advantage is offset by some extra arithmetical work. The two methods 
are exactly equivalent only for the distribution specified by P = 4sech?z. 


The writer is greatly indebted to Mr E. D. van Rest and to Prof. R. A. Fisher 
for much useful help and advice. 


REFERENCES 


Brass, C. I. (1935). Ann. Appl. Biol. 22, 134. 

— — (1938). Quart. J. Pharm. 11, 192. 

FismEgR, R. A. (1922). Philos. Trans. A, 222, 309. 

Fisser, R. A. & Yarns, F. (1938). Statistical Tables for Brological, Agricultural and Medical 
Research. Edinburgh & London: Oliver & Boyd. f 

Irwin, J. O. & OgzESEMAN, E. A. (1939). J. R. Statist. Soc. Suppl. 6, 174. 

Kossar, R. S. (1933). J. R. Statist. Soc. 96, 303. 

— — (1939). Ann. Eugen., Lond., 9, 209. 

PEARSON, K. (1930). Tables for Statisticians and Biometricians, Part 1, 3rd ed., Biometrika 

oe. 


A 7 . 69 


A NOTE ON FURTHER PROPERTIES OF . 
STATISTICAL TESTS 


By E. S. PEARSON 


De P. L. Hsu has suggested that I should write a short introductory note on 
the origin of the idea involved in his paper and in that of Dr Simaika's which 
follows.* In searching some twelve years ago for asystematic method of choosing 
the best test of a statistical hypothesis H,, Prof. Neyman and I came to the 
conclusion that an essential preliminary to any mathematical formulation of 
the problem was the definition of a set of admissible alternative hypotheses, 
C(H). Starting from this viewpoint, our first method of selecting a test involved 
the use of the likelihood ratio, but, however useful as & practical method of 
attack, the principle underlying this approach was somewhat arbitrary. A more 
fundamental procedure, later developed, was to choose a test paying regard 
to its power function, that is to say, to the chance that its use would lead to the 
rejection of H, if an alternative H # H, of C(H) were true. It then appeared 
that a number.of statistical tests in common use had the remarkable property 
that they maximized this chance for every alternative to H, in C(H). Such 
tests were termed uniformly most powerful tests of H, with regard to. C(H). 

That there were limitations to the situations in which a uniformly most 
powerful test could exist soon, however, became clear. These limitations were' ` 
gradually explored, and the following papers are further contributions to the 
subject. It was-found that these tests generally, though not always, con- 
cerned the value of.a single parameter. Such are tests of the hypothesis that 
a mean or a standard deviation has a specified value, or that the difference 
between two means or two standard deviations is zero. Further, in these cases 
the class of alternatives must be restricted; thus the two-sample t-test of the 
hypothesis that two population means 2, and é, are equal, is only uniformly 
most powerful for the situation in which the alternatives considered are defined 
by £y —£,2 0 or by £,—& <0 but not for both at the same time. 

In this connexion, in 1935, Kolodziejezyk was able to prove that for tests 
of a linear hypothesis, no uniformly most powerful test could exist if the 
number of parameters involved was greater than unity. This result was im- 
portant, since the majority of tests used in the analysis of variance can be 
reduced’ to tests of a linear hypothesis. 

This limitation of tests regarding the value of two or more parameters can 
be illustrated by a geometric presentation. Since the most important features 
of the problem can be illustrated when H, is & simple hypothesis concerning 
the value of two parameters, I shall take this case, using notation already 
adopted in this connexion. 


* See pp. 62-69 and pp. 70-80 below. 


60 - Further properties of statistical tests 


Suppose that the elementary probability law of random variables X05 
whose particular values are given by observation, is of form 


P(X, te) z, | 61 03) = p(& 6, 03), (1) 


0,, 0, being the two population parameters. For a critical region w of size œ 
associated with a given test, we may write 


E P(Eew |6,,6,] - [ο 216. θὲ) dz, ... das, = β(θι, Oa | w). (2) 
If the hypothesis H, which w has been selected to test assumes that 

= A, = 6, 0, = 6, (3) 

then BU, O|w) =a, > (4) 


where æ is the significance level chosen. 

A power surface may be obtained by taking rectangular axes for 6, and 0, 
in a horizontal plane and plotting £(0,, 0,|w) as a vertical ordinate. If w, were 
a critical region associated with a uniformly most powerful test of E, then its 
power surface would fall nowhere below the surfaces derived from other critical 
regions satisfying (4). No unique surface with this property will, however, in 
general exist. If, for instance, we choose t», so that the surface, will rise quickly 
in the direction parallel to the axis of θη, we shall reduce the rate of increase in 
the direction of 0,, and vice versa. Power surfaces of alternative critical regions 
may, in fact, cross one another in a complicated way, but no single surface can 
everywhere lie above all others. If we confine attention to tests for which the 
power surface has a minimum ordinate of α at the point 09, 09, i.e: to unbiased 
tests of H,, we shall still be unable to find & uniformly most powerful test in 
this restricted field. 

The difficulty in choice between alternative tests can, indeed, only be solved 
by a further formulation of the requirements of a satisfactory test. Several 
lines of attack are open: f 

(i) To lay down conditions for the form of the power surface in the neigh- 
bourhood of the point 69, 09. Here we may describe the objective as to make as 
large as possible the chance of detecting small departures in 6, and 0, from the 
values specified by H,. A method of approaching the problem from this point of 
view leads to the development of the unbiased test of Type C (Neyman & 

~ Pearson, 1938). . E 

(ii) To regard it as of more importance to control the form of the power . 
surface at some distance from its minimum point; for example, to try to select 
& critical region for which the power surface reaches the level 


ῥίθι, 0, | w) = 0-95, (b) 


along a contour lying inside the corresponding contour associated with any 
other test. This method of approach has been examined by Dr B. L. Welch, 


A 


E. S. PEARSON ΘΙ 


but his results are not yet published. It is possible that methods (i) and (ii) will 
lead to the same result. 3 

(ii) To consider whether from the practical point of view, if H, is not true, : 
the importance of the departure of the unknown parameters from 09, 09 can be 
measured by & single parameter, ; ; 
A = f0, 69). : (6) 

If this is so, we are in fact defining a system of contours on the 6,, 0, plane 
along any one of which we should like the ordinates of the power surface to be 
constant. Such a system would be defined, for instance, by 

A? = (8, — 01)” + (0, — 65)”, (7) 

and if βίθι, 0,|w) is to be constant for values of 0,, 0, satisfying (7), the contours 
of the power surface will be circles of radius A. Alternative tests would then be 
confined to those whose power surfaces had circular contours, H, would be the 
hypothesis that A = 0 and the uniformly most powerful test, if it exists, would 
be that for which B. |o) > B(A|w) E ' (8) 


for A> 0 and all alternative critical regions w satisfying the conditions stipulated. 

The problem thus presented in the case of a simple hypothesis concerning 
two parameters will arise in similar form when H, is composite and concerns the 
value of many parameters 6,, Ôa, ..., 0, In a number of multivariate problems 
we have reached a position in which: 

(a) testa of statistical hypotheses concerning the values of several population 
parameters have been derived, as well as their power funotions; 

(b) these power functions have been shown to depend on the value of a 
- single function - À-f(6,, 6, ...,0,) i 


of the parameters considered. 

In the following contributions Dr Hsu and Dr Simaika have examined three 
of these tests, that concerned with the general linear hypothesis, with Hotel- 
ling’s generalized T? and with the multiple correlation coefficient. They have 
shown that of tests whose power function depends only on a certain function A 
of the population parameters, the existing tests are the uniformly most powerful. 
It is of course true that in the problems in question no alternative tests are at 
present available or indeed likely to become so. Nevertheless, I believe that the 
discovery, resulting from Dr Hsu’s initiative, of the relationship between the 
test function and a corresponding comprehensive collective character in the 
population, has taken us a step farther in our understanding of the properties 
of statistical tests. Further, this relationship between E? and A, T? and y^, 
D? and 43, E? and p? seems to lead us round ΒΕ another route to the problem of 
statistical estimation. 

REFERENCES ' 


Koz opzrmsozyxK; Sr (1935). Biometrika, 27, 161. 
NEYMAN, J. & PrAnsON, E. S. (1038). Statist. Res. Mem. 2, 26. 


62 


ANALYSIS OF VARIANCE FROM THE POWER FUNCTION 
PUANDEOENE 


` Bx P. L. HSU . 
A FRESH study on the classical analysis of variance tests in the light of the 
Neyman-Pearson theory was started by Kołodziejczyk (1935), who formulated 
the class of linear hypotheses for which these tests may be employed. As a linear 
hypothesis is defined relative to the set of admissible hypotheses, the study of 
the #?-test (by which we denote any test falling under the usual methods ‘of 
analysis of variance) may be made with reference to its power function. P. C. Tang 
(1938) showed how the power function was related to R. A. Fisher's (C) distribu- 


tion (Fisher, 1928) and so was able to appraise the chance of detecting the falsehood .' 


of a linear hypothesis using the E?-test. The great theoretical value of the power 
function lies, however, in its use in comparing the relative merits of alternative 
tests of the same hypothesis. In this paper we shall prove a theorem (p. 63) : 
which asserts that out of a certain class of tests the E*-test is uniformly most 
powerful. 
In his paper Tang has used an orthogonal transformation in the sample space 
' which enabled the general linear hypothesis to be reduced to the following simple 
form: Given the elementary probability law à 


; EPI n 


where all real values of 7,,...,7,, and all positive values a o are admissible, the 
hypothesis is that n, (<m) of the 7’s have the true value 0: 


I = M m + = In, = 0. p S (2) 

We call the above hypothesis Hp. 

We shall set H " i . 
= Sut/( Eats Σα), | (3) 

i i-1 i=l . 

and call w, (of size €) the critical region for the rejection of H, defined by the 
inequali i 

dd BY> Bt, (4) 
where E? is a constant so determined that the probability that (4) is true, given 
that (2) is true, equals e. 

The power function of w, as given by Tang can be written 


e Y (hl B(gn, +h, ἐλ} 1λλ Í * Qe Eji E), (6) 
hao 7 Eg 


P. L. Hsu l ο 63 


where 7 . | "332 Σ 7.. (6) 


An utstanding feature of the power function (5) is that it depends on the single 
parameter A. Our problem is, does there exist another critical region of size e. 
whose power function depends on A alone and which is more powerful than w, for 
certain values of A? The answer is contained in the following theorem and is in 
the negative. : 


THEOREM. Suppose that the critical region w satisfies the following conditions: 
(a) w is of size e, ` 
(b) the power function of w depends on the single parameter À. 
Let B(A) be the power function of w and βο(λ) be the power function (5) of wọ. Then 
BA) SBA) . (7) 
for all posttive values of λ. : ` 
Proof. In the place of z,, ...,2,, we substitute spherical co-ordinates, viz. the 
radius vector r = (2238 and n — 1 angles, ϐ:, ...,0,, τ. We deduce from (1) that 


n=l’ 
Bs -Ym Ors s Oni T) = PY as το Ym) PY myers Vm) Ῥίθι, ... θη. α)Ῥίσ), 


(8) 
: m ; 
where Ῥίψι, -..» Ym) = (277) σ) -™ exp [-3À =n}, (9) 
RES m m 
Passt) = (met exp[-3 X wor) dO 

ply) = 2:99 (jn) esp ( - art, (1) 

and p(0,,...,6,_1) is the well- known product of cosines which involves none of 

the saipariotahe Πιν ευ Im and o. 
We now make the following successive transformations: 
r=, g=t— y y, yı = tu, (ἐ--1,...,πη), > (12) 
i-1 
&nd also write i © yiı= oq, (£—1,...n). ; (13) 


It follows that 


PY μη op σα Un ὃ) Ν 
= PY nats abt Ym) plôu- té. 8, 3) p, tts Uny t), (14) 
where 


(Uy, -<-s Uny 0 = (20) 0m? ΤΗ ελ pinin) s 5 " 


' x(1- Su)" "on (¥ Eva. (18) 


{5-1 


64 ` Analysis of variance from the power function standpoint 

From now on we shall write y for the set of variables y, ,, ..., Ym and dy for. 
dy, ip το Ym, and use similar abbreviations 0, u, ἆθ and du. 

Suppose now that the critical region w satisfies the conditions (a) and (b). 
Let I'(y,0,wu,t) be the characteristic function of w, i.e. L'(y, 0, w,t) = 1 or 0 ac- 
cording as the sample point falls within w or not. Let W be the sample space. 
Then the power function of w is 

pa) = | 79.0, p(y, t) dyddduat, (16) 
w 
~ whence, 
(yearn Tyo uN ply) pia) erm + 
Ww 


t της ας i(n—2) m 
,Xexp ( ~ 5,0] E -X4) dy dO dudt = marn) 6, (17) 
(J2 eye Γίν, 6, u, t) ply) 2(6) piece -9) 
i W 


t - ae 4(n—2) Jt mo 
x exp ( -παὶ (2 - $a) exp E γαι) dyd@dudt 
= se In) AAA) = F0) = F( $), say. (18) 
tw ; 
Let W, be the sample space of 0, u and t, and put i 
(2oy f rt 0, u, p enm 
IF: 


`~ 


t ^h Š in-2) alt fü 
x exp (—575] (1-5 4) exp ($ X vos) doduat 


=] 

x ς — F(A) = φίν,γ,σ). (19) 
Then, by (18), [s D e o ὦ = 0, (20) 
s eo o " 1 m i » 

1.6. nt s $ Vn, i1; ehm Yo c Yn c) exp ( 20$; E s) 2 
x oxp( X αμ] dy, a dy — 0, (21) 
t=n+1 


on writing a, for (2a?)719, (i n4 +1, ...,m). 
Equation (20) must hold true for all real values of the a’s. Hence it follows 
from the well-known theorem on Laplace transformation* that 


1 m 
g)exp| —z— 3|—0 
Ply, Y, σ) exp ( : πε, Σ 9 
* Cf. Doetech (1937), p. 35, Theorem 1. Though the theorem referred to is stated for the case 
where the number of y’s is one, it may easily be extended to the case of more than one y by 
induction. 


P. L. Hsu . * 65 
le. d(y, y, 7) = 0, whence, by (19), 


Wzor f Ty, 8,u, t p(6) tin 
Wi 
t πι Qm» jt πα Moi g 
xexp| -zza (1- Σ υἳ exp 6 ΣἙγιιι]άθάυαί = F| 3jy1|. (22) 
i=l σι v-1 
In particular, from φ(ψ, 0, σ) = 0 and (17) we have 


(J2 σ) etn) i . I'(y,0,u,t) p(0) t» - . 
w 


1 


iy Mm in-3) ^ 
xexp (555) (1- DES dódudi = mri I'($n)e. (23) 
i=1 
Letting W, be the sample space of ϐ and u, we get respectively from (23) and 
_ (22) that : se 


5a qao [^ sein : αυ... 
` 0 - 


x exp (^x) di | Γίν, 6, u, t) p(0) (2 -- Σ ut)" dodu = n (dn)e, (24) 
2σ3 W, i 1.13 


ni Hn—2) 
(yaoymim | ΡΜ p (1- 2:4] 
0 20° m, -ι ο). 


4t Mm - θ᾽ 
X exp | 2 Vite) du = F| Ὁ γῇ]. (35) 

íi-1 væl / 
Hence, on developing the left-hand side of (25) into power series in the y’s, we 
must have 


© ÜN 
κ 249) —— |d 
[ναοί — se) 
in—5 ( πι 


. 
x | T'(y,0,u,t) (6) (i- Σ : (S yes) dOdu = 0 for odd h, (26) 
Wi 4m] i-1 


/ 


@ 
+ QAM) g—(n-+ny+2h) Í {+1 ah 
0 


t MP i1n—2) { πι - ah 
x οχρ{ -zat | Γίν 6, u, t) pa (1- $a) ( 2; για] dO du 


=l 
n U 
= a,( 378) (A=1,2,3,...). (27) 
Further, equations (24) and (27) may be written as 
o M t 
3n-rn—2) -— E 

f t a οχρ( za) Ρ 

fh 1-3) . gi I(4n) 
I(y,0,u,t) p(0)| 1 — 3 u ἆθ άν — 

] «|... BATES { £u) ji I*3(n-4- 04) 


Biometrika xxxit 5 





e] di = 0, (28) 


66 Analysis of variance from the power function standpoint 
το x t Mm 4-3) 
f ο... | ry, 0,4, noo (1- Σ οὐ] 
0 20 F, TES 


$i αι Σ Σ A). 
i (Eve ας “2h gn +n) τη 
Equations (28), (26) and (29) must hold true for all positive values of σ. 
Hence, by the theorem of Laplace transformation,* the functions within the 
square brackets in (28) and (29) and the inner integral in (26) must vanish 
identically: ` | 





=0 (h=1,2,3,...). (29) 


‘ κ... i(n—2) gi ran), 
iM I'(y,0,w,t) p(0) (1 Ža) d6du = Πω (30) 


Mm 


A(n—2) / πι ] 
Í T(y,0,u,t) p(0) (: -5 uf) ( £ Z3 ἆθάν = 0 for odd h, (81) 
W, {51 i=1 


f, Tounoa (i X a)" (È veu) dau 


im] 
ny h 
τ. alà») 
i Γη πι) τς (h=1,2,3,...). (32), 
From (31) and (32) we infer that 

^h i(n—2) fh ; 
f Tuun 200)(1~ Sa) exp( È vau)adu = a (Ex). c» 

i=l 


Now for wy given values of y and { the integral f T(y, 0, u, t) f(0, u) ἆθ du 


equals |. AK (6, u) dð du, where w, is the set of points in the sample space of 0 and u 
for which /'(y, 0, u,t) = 1. Hence (30) and (33) are equivalent to 


|. p(0) (1 - ΕΝ dé ἀν = ES Tide (34) 
J τ ( x Eu) de τ (ἃ ya) d BAM (ἕν) ` (35) 


The conditions (34) and (35) are necessary and sufficient that the critical region 
w should have the properties (a) and (0). 
On the other hand, from (12) we have 


^ mh à 
= Xu (36) 
i-1 
hence w, is the region defined by the inequality 


Mm ^ 
Xu ER (57) 
i=l 
* Cf. footnote, p. 84. 3 £ 


` 


΄ 


P. L. Hsu 67 


Since to is of size c, we must get the same right-hand side of (34) when in the left- 
hand side we substitute w, for w,. Hence 


M 


E p(0)(1— Ža) “dod z |. πό(-- $ a)" ao. (38) 


Let [ος e- Eu (Zu) atas = ΠΡΙ (39) 


With the help of (37) and (88) we mayn now e to the lemma proved in the 
Appendix and conclude that ‘ 
GA)< σα, ` (40) 


whence, replacing y; by o~1,/ty, in the integrals in (35) and (39), 
Á" d i(n—2) vt m 
[o (1- aa)” exo (T $ yeu) aba 
A πα 10-8) At πα 
«| p(8) (1 -5 z exp (5 > yes) dOdu. (41) 
w (00 451 σι 
If we multiply both sides of (41) by 


| e. T t 
Borm gn) O) Pim exp (- za): 
and integrate over the sample space of y and t, we get, in accordance with (18), 
] P(A) < β{λ). 
Hence thë theorem is proved. ' 
APPENDIX 


, LEMMA. Let g(z)20 de defined for 220 and vanish ro ο» 1. such that 
git... +05) is summable. Let f(w,...,w,,)2 0 be summable. In the product 
space of the v's and ws let R be a region such that 


[flee es Wyn) g(vi + > +v® )oxp (7101 + ...-- y,v,) ἀυάω = Gy? +... +72). (1) 


Let wy be the region defined by the inequality l . 
vit... +08 pk. ; (2) 


Let [ies veep Wy) g(vi - ... +02) exp (γιοι - -).+Y,0,) dvdw = Gy(y?+...+%). 


ς . (3)* 
Finally, let . ` 
Í ftn, s Wm) GOR +... $02) dvdw =Í JW ..., ορ) g+.. + v4) dvdw. (4) 
R n 
Then G(x) < Ορ(α) (5) 


for all positive a. 
* Notice that (3) is not a separate condition on Ry, but is implied by (2). 


68 Analysis of variance from the power function standpoint 
Proof. In (1) we set γι = z, Ya = ... = Yn = 0, and get 


Q(x?) = f o 1 Wn) g+ ... +02) exp (ων) dvdw. 


This and the conditions on f and g imply that 6 is continuous. 
Multiplying both sides of (1) by exp (— Ly?) and integrating over the region 
0<a<Lyi<b, we get 


b 
K i a9 e-t Q(x) dz 


a 


; -{ ftus... y) EWA) dou | exp(—Zyi--Xy,vj)dy, (6) 
R ας Ey,«b 


where K is some numerical constant. Applying a rotation in the space of the y’s 
to the inner integral in the right-hand side of (6), we obtain 


~ » N à - 


. vb. J 
^K f ` gii ez Q(x) dz 


-f ζω, ... qo) dvdw I, exp { — Σα + (208) αι] dæ 
ελ 
m Σ c, I (F), 
A=0 
1 
where . ΤΘ) = ax; | gfe ἐκ Wy) φ(Συ8)᾽ ἀυ ἄω, 


a= f ror Coe 
agin’ <b 
Similarly, we have 
b e 
κ gin-9 6-2 G (x) da = = Catal Bo). 


An appeal to a general lemma of Neyman and Pearson,* on remembering 
(4), leads to the inequality 


119) & I, (E). 
Hence f. gin- e-z G(x) — Gy(x)} dx « 0. 


Since a and 6 are arbitrary and since the integrand is a continuous function, the 
latter must be <0. Hence G(x) < G,(z). - 
* Neyman and Pearson (1936), p. 11. 


P. L. Hsc 69 


REFERENCES 


Doxzrscu, G. (1937). Theorie und Anwendung der Laplace-Transformation. Berlin: Julius 
Springer. : 

TFisuxsg, R. A. (1028). “The general sampling distribution of the multiple correlation 
coefficient.” Proc. Roy. Soc. A, 121, 654-73. 

Kozopzresozyi, St (1935). '"Οπ an important class of statistical hypotheses." Biometrika, 
27, 161-90. i 

NEYMAN, J. & Pearson, E. S. (1036). ‘‘Contributions to the theory of testing statistical 
hypotheses." Statist. Res. Mem. 1, 1-37. 

Tana, P. C. (1938). “The power function of the analysis of variance teste." Statist. Res. 
Mem. 2, 126-49. ͵ 


70 
ON AN OPTIMUM PROPERTY OF TWO IMPORTANT 
l STATISTICAL TESTS 


By J. B. SIMAIKA, Ph.D. 


P. L. Hsu (1940) has shown that for any linear hypothesis the E?-test is the ` 
uniformly most powerful of all the tests whose power function depends on a 
certain function, A, of the population parameters. Two other tests of importance, 
namely, those associated with the multiple correlation coefficient and Hotelling’s 
T? (Hotelling, 1931), have the similar property of being uniformly more powerful 
than all other tests whose power functions depend on the respective functions 
of population parameters involved in the distributions of R? and 7. It is the 
purpose of this paper to establish such an optimum property of these two tests. 
We shall consider them separately. 


l I. Horgnumo's T? 
The general problem that calls for the T?-test may be stated in the following 
way: given the elementary probability law 


PY. t Yp 811» S195 € 
X g a 
= K |84 |" exp | τα, x Ay (4 — 74) (4, ση) -ᾱ X aysal (1) 


os (αι) αμ, 85 = 8j) 
it is required to test the hypothesis that : 


7,0 G = 1,...59). (2) 
Hotelling’s test consists in calculating 
á ; 
Th c£ νε (3) 


where s denotes the general element in the matrix || 5;; ||, and rejecting the 
' hypothesis if Th» TS, (4) 
where T? is a constant so determined that the risk of rejecting the hypothesis 
when it is true equals c." 

The distribution of T? derived from (1), which conforms with Fisher's (C) 


distribution (Fisher, 1928), was obtained independently by Hsu (1938) and Bose 
& Roy (1938), and may be written i 


z(T* 9, α) = p(T? | 4°) 


2e Σ (yey 1 (T3)eh1(1 4. Teymar (5) 
' n=o A! B(iqth im) ? 


“ g 
where . Y= Ln PRSE (6) 





; J. B. Samara 71 
Hence the power function of the T?-test is | 


[minam g 


which depends only on the function y? of the 7’s and œs. Our first theorem asserts 

that the T?-test is uniformly more powerful than any other test whose power 
_ function is a function of y? alone. 

ΤΗΕΟΒΕΝ I. Let wy (of size ε) be the critical region defined by the inequality (4), 
and w be any other critical region whose size is € and whose power function is a 
function of 4°. Let p(y?) and βο(ψ) be the power functions of w and wy respectively. 
Then : 

B(*) « Bol"). (8) 

Proof. Let us first find a necessary and sufficient condition that w should 

have the properties described in Theorem I. We have, from (1), 


α΄. d 
p(y, 8) = Ke | 8; |n! exp | a t 2i eun T4) : ἃ avt (9) 


Hence, on setting : Uy — $41 ($3 = 1,--59)5 (10) 
and & = P GU, ($1, 9 ` Qu 
we have ply,u) = Ke" |uy—y,y, pr-texp (- PE Gata + Σ ον) (12) 
and e Wadd GC | ' (13) 


where αὖ denotes the general element of the matrix ||2;; ||. 
If w is of size c and has a power function depending only on y?, then 


κ]. (Us — Yyy | exp(—4, ἃ ayuy) dydu = € (14) 

and ας 
K E |y tnim exp (4. X iie Σ ; Qun) duda = e" Bly?) = Ῥίψϑ), say. 
(15) 


It follows Bot (15) that, on expanding the left-hand side into a power series in 
the C'8, we must have 


h 
|ug e, lm exp ( -4 Σ ayy} (3 Gay) dydu = 0 for odd h, (16) 
i tye] j=] 
K : im—1 $ Σ 2 
κ ο... 
1 Mg, 
ela" (b= 1,2,8,...), (17) 


~ 


where the a, are numbers depending only on the region 10 chosen. 


i A 
whence zf Ὁ ο. » 
TF íj- 


72 On an optimum property of two important statistical tests 


On the other hand, since the integral of (12) over the sample space W is unity, 
we have 


q q - : 
KÍ |y wis 7t exp ( - à X agugt M ων) dydu = ον", (18) 
ΤΡ il t=1 


αγ} dydu = 1, | - (19) 


i MICE Ww 
(2h)! f. | a4 — AV, ms exp ων τι αμ us) [> gn) dydu 
=A (h = 1,2,3,...). (20) 
Combining equations (14) and (19), (17) and (20), we obtain 
a j 
ii [ug = yy | exp ( — 2 x : ast) dydu 


q 
=ef | ug — YY jeexp(- x aug dyu, (21) 
7 Ww $271 
q q 2h 
Í stis | exp( -4 Σ yug ( 3: 6s) dydu 


q q 2A 
= ay | lusti; exp ( -à 2 2 X ων) dydu 
w 6151 i21 
(h= 1,2,3,...). (39) 


The sample space W is the product space W(u) x W(y|u), where W(u) is the 
sample space of the w’s and W(y | v) is formed of the possible positions of the 
point (34, ..., Yq) for given values of the ws. Similarly. w = W(u) x w(y| u). If we 
evaluate the integrals in (21), (16) and (22) as repeated integrals, we obtain 


σ 
[αι δω) —. 
Wu) £371 


x E | ttig —349; 77 dy- f |244— Yy; "71 dy | du = 0, (23) 
wiylu) Wil) 


q de q A 
Í exp ( -ἐΣ a) L | 144 — UY, [ioc ( Σ Gu) dy |du = 0 for odd ^, 
TF (u) i21 w(ylu) c vel (24) 


2, ᾿ q 2h 
f exp( -4 à ast) f ο n (15 Ses) dy —a, 
To) 17-1 w(ylwu) - i=l 
σ 2h 
ο ο οκ ο 5. 
W (ylu) i-1 D 


Since equations (23), (24) and (25) must hold true for all admissible sets of values 


« 


^ 


4 


dJ. B. SIMAIKA - 73 


of the &,,, 80, according to the lemma proved in the Appendix, the functions 
within the square brackets in these equations must vanish identically. Hence 


ug 40 |d =ef V4; yy; | dy, 26 
EN uc Wl y ΜΉΝ ο wl y . (26) 


q À 
Í | 144 — Vi; e | 2x ων) dy = 0 for odd h, (27) 
wi(y|u) ` iml 
qc 2h zh 
δν | 44 = Yay, |n (à ὄν) dy = αι ΠΕΝ As 3 7 9, |? ie Σι hays) ἂν 


(—1,2,8,) (38) 


In order to simplify the above equations we notice that the matrix I| t5 ll; 
being positive definite, can be thrown into the form CC’, where C is a non- 
singular real matrix. Using the transformation 


Yas +++ Yq ll = Was -- mS IL C, (29) 
we get that : 
δ E q ἐπι--1 q ἐπι--1 
ο Bat)” e-[ -àa« ae (30) 
tjw(z|u) wl Welw) i=l ] 
q im-1 
Í Ἱ- X 2) 12 X ju) de = 0 for odd ^; (31) 
w(z|u) ἐ--1 


| αγ Ge ye 


i-1 
where ll ἔν»... ἔα] = Wo -- SiG. - (33) 
Now W(x |u) is the region S (independent of the w's) defined by the inequality 
Σ ας]. 64) 
i=1 


Hence the integral in the right- -hand side of (30) i is a numerical constant, say b, 

and a rotation in the space of the z's enables the integral in the right-hand side 

of (32) to be-written as 
ue 


q h : q imi : 
(5a) | (-Ea a (38) 
i=l 5 {51 ; 
Hence we obtain the following equivalents of equations (30), (31) and (32): 
im- 
Í (1- Σα de=be, - (36) 
αὐ(α]ιθ) 1-1 ΝΗΡ 
q ἀπι1/α h : 
Í (1- Σα (Ea) de <0 for odd A, (37) 
waju) i-1 i=l i ΄ 


Ίο (i- Σ κ και) de= s ($a) (h = 1,2,3,...), (38) 


tel 
where the 6, are numbers depending only on the choice of the region 10. 


^ 


1 


74 On an optimum property of two important statistical tests 


The set of equations (36), (37) and (38) give the necessary and sufficient 
condition that the critical region w should have the properties described in 
Theorem I. Further, equations (37) and (38) may be combined into the 
` following: 


q na / a 5 a pt 
[ua ES)" (È teji = σ( Σ8). (39) 
w( xu) =l 1-1 1551 | 
Now according to (29) we have. 
Xare Y wy = TY(1+7), (40) 
{51 $j-1 5 


' where u denotes the general element of the matrix || u,, ||. Hence w, is the region 
defined by the inequality 


S a> TYTY —— | (41) 
i=l] 


Since w, is of size e, we must have the same equation as (36) when w is replaced 
by w, therein. Hence 


J wihi) ( i PLE j f NU -È J ae | (42) 
κα, | w ( z at)" RAN (28) fes (E g), (43) 


we deduce with the help of (41), (42) and. the lemma proved by P. L. Hsu in the 
Appendix of his paper (1940) that 
| q j ᾳ 
a(S ea)<a,( he). (44) 
i-1 te 


Applying the transformation reciprocal to (29) to the integrals in (39) and (43), 
we get 


a q 
Í | 4g — yry; | exp (Σ ων) dy < [ | ti — ysy; [7 exp Σ ων) dy. 
wiyiu) i-1 wiylu) i=] 
! (45) 
Hence on multiplying both sides of (45) by K exp ( — y?) exp ( -ᾱ Σ ay) and 
12-1 
integrating over W(w) and remembering (15), we have the inequality (8): 


UA) < fol), 
which was to be proved. 


`J. B. ΘΤΜΑΤΚΑ 75 


' IL Mourns CORRELATION COEFFICIENT . 
* In this connection the basic elementary probability law is taken to be 


p, YoYo Cray Lis ... Bgg) = K1- P] z gy, ... Ψα iinet) 
δι Tu ig 
"Ys Ta Ὅρα 


a q 
x exo( -bve- $ Aa 3 auzu) (1) 


(6,7 αι Ty = y), 


` 


q 
where p* - Σ a? BB; (2) 
» : Yuin i 
is the square of the multiple correlation coefficient of the population. We have 
a 
Z Yı ... Yq = [zyl (2- > yyy). (3) 
Yı Pa ο Be : (751 
Yq Ty -- Ορ 


and that the square of the multiple correlation coefficient of the sample is 


κ μες 
R=- gH yc. 4 
"DR δν y (4) 
The hypothesis to be tested is that 
βι- 0. G—1L..49.. (5) 


ΤΠΕΟΒΕΝ ΤΠ. The basic elementary probability law and the hypothesis under test 
being gwen by (1) and (5), let wy be the critical region of size ε defined by the inequality 
a | Ri» R? | (6) 
and w be any other critical region whose size is € and whose power function depends 
only on p*. Let B(p*) and Palp?) be the power functions of w and wy respectively. 
Then 

B(p*) < Palp’). (7) 

Proof. Suppose that w has the properties described in Theorem II. Then 


‘mn —q—2) 


σ kc σ 
K Í | a, [407279 (z -- Σ αὖ van) exp ( -ἐγε-ἑ È αὐτὶ dzdydx = e, 
το 5321 *2-— 


(8) 
g 3n1—3—2) 
K Í | z,, POD (z — Σ ati νιν) 
w 1,755; 
a q 
x exp ( —4yz- 3 D ig PE νι) dadydx 


= (1— p°)" (0?) = F(p?), say. (9) 


76 On an optimum property of two important statistical tests 


Hence, on developing the left-hand side of (9) into a power series in the f’s, 
we have 


g : in—g—9) 
i [ορ evi >} ayay) 
w $j-1 


t 


- h 
X οχρ --ἆγο--ἆ Σ ago) ( X Box) dzdydx = 0 for odd h, (10) 
íj-1 iei 


K |z [in-2-9 z— Σ gH uy 
(2h), F επι. 


g q 2h 
xexp( -372-4 Σ auzu) ( $ Aivi) ἀεᾶγάς 
_ T(4n+h) 
~ hl Tln) 
where the a, are numbers depending only on the choice of the region w. 
On the other hand, since the integral of (1) over the whole sample space W 
is unity, we have 
a 
zf [αρ leen (i Σ ay, 
wW i, 


1-1 


pu 


a (p> (h=1,2,3,...), (11) 


js 


[4 4 
xexp( -472-4 $ ayzy— X An) dedyde = (1~pty 4%, (19) 
,1- = 
whence i 


K Í [αμ |i ή Σ αὐ ΤΝ 
μη ied © Soe z 
p Ἢ vnd Yi; 


exp ( -ἐγα-- i $ auza) dzdydx = 1, 
K dn-2—9) | (13) 
q n-q-— 
Pii in—-a-3) | z — ή 
μμ.) 
ᾳ q 3A C 
- xexp( -472-4 $ oun) (3 Pau) cava: 
— Tn) 
~ MIY(n) 
Combining equations (8) and (13), (11) and (14), we obtain 


K g P in—92—3) 
NL 
w iíij-1 


(p (h = 1,2,3,...). (14) 


a 
x exp( -472-4 X ayay) dzdyda = ef ‘(...)dzdyda, (16 
. 4, = Ww 
` q Kn—q—2) 
Í [αρ |Kn7«-9 ί-- Σ Zr 
«9 το {1-1 
4 q 2h 
xexp(—2y2-4 È aya) (Zw) dedyde 
= t= 


-a | (..)dedyde (h = 1,2,3,...), (16) 


J. B. Sina 77 


where the-unwritten integrands in the right-hand sides are the same as those 

in the left-hand sides. p 
As before we argue that W = W(z,x)x W(y |2,2), w= W(z,z) x w(y |2, ο) and 

evaluate the integrals in-(15), (10) and (16) as repeated ἘΝ It follows that 


[4 
Í E aj pa-an exp (— tyz-F X ast] p (2-3 yy, 
We, α) $1771 w(ysz). 1j-1 


^ € X 1(n—9g—2) 
-6 (z - > af νων) 
Wyls, ο) ta=l 


g 
MENDES (-#7--4, Σ ns) 
W(s, 2) íj-1 


ἑίη--ᾳ--3) 
) * 


ἂν | dzdz = 0, (17) 


/ q in—2—-2 / a h 3 
. p (-- £ ayey) ( X Bav) dy |\dzdx =-0 for odd h, (18) 
to(y|z, z) Aral . . i=l Fd MEE o 


αι 


’ d * 
χμ [αρ (- z— Gi. ) 
MENDES 


Le n "n 


^—2—3)/ q 2h z 
-a | ί---Σ Σ vy)" Σ Ps) dydezdz — 0 (h=1,2,3,...). 
Ws, x) $j21 1-1 


- : (19) 


ή 


According to the lemma, proved in the Appendix, the functions within the 
square brackets in the above equations must vanish identically. Hence 


| (:-.Σ Σ zy, 
w(yis, 2) $j-1 


` Xn—a-2)/ a M 
i (;- 3 Σ vty, n) ( 5 Bin) dy = 0 for odd h, (21) 
ο(ν]5, z) i-1 f 


$291 


q in—2-2/ @ 2h 
Í (- — X aly, v) ( x Poss) dy 
w(y|s, x) 611 i=1 


=a f (...)dy (h21,2,8,...). (22) 
wiyle, Ὁ) 


4-29 » 
| 2 dy =ef (-)dy, — 0) 
Wyle, x) τ 


In order to simplify the above equations we notice that, since the matrix 
ια, || is positive definite, it can be thrown into the form CC’, where C is a non- 
Sinpolar real matrix. Using the transformation 


|9ι,... Yell = 25... 61 C, : ΄ (29) 


78 -On an optimum property of two important statistical tests 


a 


5 : a .\in—g-2) i 
we obtain ; Í (1- 34] =ef (...) dt, (24) 
utis, x) i=l W (tle, 2) ' 
d Ἠπ--α--8) h 
: f (1- Σ a) σα ( Σ τα] di = 0 for odd h, (25) 
witle, ο) tel 00M | 
a .Mn-q—2/ a 2h S 
l (1-3 a) (Στα dt = a, | - (...)dé (h= 1,2,3,...), (26) 
vwüls, x) i1 i-i W (is, α) ANE á 
where ewer lees all C., 


Now W(t |z, x) is the region S (independent of z and the x’s) defined by the 
inequality 
Y xl (27) 


Hence the integral on the right-hand side of (24) is a numerical constant, say b, 
and a rotation in the space of the {8 enables the integral on the right-hand side 
of (26) to be written as 


@ À 5 a i(n—g—2) 
(Σπ) [ (1- Za) Badi 3 (28) 
ial 8 i-1 
Hence we have the following equivalents of (24), (25) &nd (26): 
p a V-g-2 
| Í (1- Σ a) dt = be, (29) 
e 9) wltis, 2) i=l : 
4 \Kn-g-9/ α h 
zi (1- Za) (Στο eae odd, (30) 
v(le, 2) i-1 {51 ; . 


i=] 


a \in-g-8) ^a 2 qo 
Í (i- ΣΙ : (Σα) at = δεί Στὴ (h = 1,2,3,...),- (31) 
επί], α) 151 i=1 


where the b, are numbers depending only on the choice of w. 
Equations (30) and (31) may be combined into the following one: 


m z) ( E Σ 8 ΣΡ ( E: 2 T, aja -G (5 1). : (32) 


t=1 


Now, by (4) and (23), we have 


q 
R= Σ 8; ' (33) 
5 
consequently the region w, is defined by the inequality 


X #> ΒΒ. | (34) 


\ ᾿ i=l 


J. B. SIMAIKA . ΄ : 79 


. Since w, is of size e, we must haye the same equation (29) when w(é| 2, 3) is 
replaced by w, therein. Hence 


q- \iln—g—2) α in—g—2) : ᾽ 
Irena? Es) a-[(-Ee «^ . e» 
w(t], a) i=l i y t= 


On setting - i (i-, ο... xp (- Ent) ae~ a ($2), (36) 


1 


we infer, ‘with the help of (34), (35) and the lemma proved by P. L. Hsu in the 
Appendix of his paper (1940), that 


q a : 
e(Xa«e, (X4). (37) 
Hence, using the transformation reciprocal to (23) to the et in (32) and 


(36), we have 


i 4(n—q—2) σ l 
Í (-- 3 Σ 2D) exp (—3 Bux) dy< Í (...)dy. (38) 
w(yls, 2) 121 ισο(ν|5, z) . 


j=l 
Multiplying both sides of (38) by 
σ 
παρ | x, σαν exp(— 72-4 Σ — 
$271 
and integrating over the space W (z, 2) and remembering (9), we obtain 


B(p") «Αρη. 
ια Theorém II is proved. 


y 


I am gratefully indebted to Dr P. L. Hsu for putting this problem before me 
and for his helpful suggestions both in the course of my research and in ος 
this paper for publication. 


APPENDIX 


Lumma. Let E(x) be the set of points (211,245, ..., 244) for which the symmetric 
matrix ||x || is positive definite. Then 


f... nett exp (- 3 xy)de<eo G7 1.0 (1) 
E(z) tet 
and ; 

φία) exp (-À« η dæ = 0 throughout E(x) (αμ ο αι αι) (2) 


E(x) 
imply that f φία) = 0 almost everywhere in E(x). (3) 


H 


» 


i 


80 On an optimum property of two important statistical tests 


Proof. Suppose that both (1) and (2) are true. Since the matrix || δρ ἠ- 0, ||, l 
where 6,, = 1 and ὃς, = 0 (+3), 0y = Ojo is positive definite for all sufficiently 
. small real 6’s, so, by (2), 


for all sufficiently small real 6’s. By (1) the left-hand side of (4) is an analytic 
function of each of the 08 in the neighbourhood of the imaginary axis. By 
analytic continuation (4) must remain true for all complex 6’s with sufficiently 
small real parts. In particular, f 


q α i 

ῥᾳ)οκρ[-- Σπι]οκρ(ν-1 X tyay)de=0 y= ta) — (9 
E(x) - - 51 1-1 

for all real values of t's. Hence, by the well-known property of the Fourier 

transform, i 


+ 


φία) exp ( — X zu) = 0 almost everywhere in E(x), 
i-1 
, which implies (3). 


REFERENCES 


Boss, R. C. ἃ Roy, S. N. (1938). “The distribution of the ‘Studentized’ D3-statistio."" 
Proceedings of the Indian Statistical Conference, Calcutta, pp. 19-38. 

Fenm, R. A. (1928). “The general sampling distribution of the multiple correlation 
coefficient." Proc. Roy. Soc. A, 121, 654-73. 

Horste, H. (1931). “Tho generalization of Student's ratio." Ann. Math. Statist. 2, 
359-78. : ' 

Ἠαυ, P. L. (1938). “Notes on Hotellng's generalized T." Ann. Math. Statist. 9, 231-43. 

" ——- (1940). “Analysis of Variance from the power function standpoint.” Biometrika, 32, 

62-9. 


81 


MISCELLANEA 


΄ 


(i) A recurrence relation for the Seri-mvarsanis of 
Pearson curves : 


Bv M. G. KENDALL 


"The Pearson curves are defined by the differential equation 


yla +a) de 
b, -- δια tbar?” 
Multiplying by e'*(b, + δια + b,2*) and integrating over the range of the distribution, we have 


dy = 


[ve +2)de= fe + b,x δᾳα) ο)" dy 
= [(b, +b, x c 5,23) ety] — fya eta (b, + 2b. + ἔ(ὃρ +b, e+ δᾳωλ)]. 


At the extremes of the distribution we may suppose the expression in square brackets on 
the right to vanish and hence 


feteula-tb,-+B9t-+ (1+ haa bua) ἂν z0.. | | ws (1) 
The moment generating function of the distribution, M(t), is given by 
M(t) = Jess 


and hen: i = feras, etc. Thus from (1) 


BOM dM ` 
byt πα +(1+26,+ 6,2) — -y (a +b1+ bot) 14 = 0, UN (2) 
& linear differential equation of the second nal which may also be regarded as defining the 


Pearsonian system. 
` Incidentally, 15 would be interesting from the theoretical view-point to consider classes 
of frequency distributions defined by differential equations in their moment or semi-invariant 
generating functions, 
So far as I know there is no solution of (2) in ordinary functions which would permit of 
the explioit expression of the co-efficient of t^ in M(t); but from a consideration of the co- 
efficient of {Γ in (2) we have 


(LE (+2) by} iL a fa (r1) b) uerba =O, Θ---. (3) 


the well-known recurrence relation between the moments of Pearson curves. 

Some simplification of this expression 1s possible by the choice of a particular origin ın 
cortar cases. If the roots of b, b t boat = 0 
are real, it is possible by & real linear transformation to transform the equation defining the 
Pearson curves to one which does not involve by. With the origin defined by this transforma- 


tiori; we DENS ες) 0h ey tiati 1) δι) μι = 0, 
(a+rb,) (a+r —1b,)...(a+5,) 


e / iy! Bo 00 ae (4) 
are foe larr aeng . (1+ 2b,) 


Biometrika xxxi 6 





782. i _ Miscellanea 
Putting K = log M in (2), we have for the semi-invariant, generating funotion 


n c dK 
of E+ (e) |o Ry 0 G+ (a+b Εν) = 0, amas (5) 
This is nbt linear, and it appears therefore that there is no simple recurrence relation among 
the semi-invariants as among the momente. The equation is similar in character to that 
known as Riccati’s and the usual way of solving it would be to return to the linear equation’ 
(2) from which it was derived. 

Taking an origin at the mean (κι = 0) and considering the co-efficient of / in (5), we have 











bk Kg Kr Κα Kra Kr η Kean Kr 
Kp Kra Ks το Kal L0 οὐ) 8 4.5, S = 0 
-pith Τρι 2100-30] BV! TUA ασ ο 


` . r—1 
or {14 (r4- 2) δὲ} κ, νι +rb κι He rb, ( 1 ) KiKi 


. r—1 r—1 —1 
[ + 9 Κα Κρις οι. t 5 Kii Krat- t 1 Kpa)  Ὁ, ...... (6) 


with the initial relation . ky = —bo/(1+30,)- 


, Equation (6) seems to be as simple a recurrence relation as we can expect for the expression 
of a semi-invariant in terms of those of lower order. 


(ii) A comparison of annual and biennial inflorescences of 
Daucus carota (wild carrot) κ 


By WILLIAM DOWELL BATEN 
Michigan State College 


INTRODUCTION 


Iw 1932 seeda from Michigan and Indiana were gathered from Daucus carota for the purpose 
of studying environmental effects on the numbers of pedicels and bracts per inflorescence 
from plante grown from Michigan and Indiana seeds. In 1933 these seeds were planted in the 
botanical gardens of the University of Michigan in the green'house and later planted outside. 
In 1933, 44 Ὁς of the plants bloomed; in 1984, 17 % of those that did not bloom the first season 
gurvived the winter and bloomed. Resulta of this study were published by the present writer 
(1934) in an article entitled “A statistical study of Daucus carota”, in which the numbers 
of pedicels and bracts on annual and biennial inflorescences coming from these seeds were 
compared. At the end of the article K. Pearson pointed out that since the seeds were taken 
from many plante in the wild, some of the seeds might have come from flowers blooming the 
first season and others from those blooming the second season, that one did not know how 
many annual and biennial seeds came from the'two states and that the comparisons might ` 
not be the same if this was corisidered. 

To overcome this just criticiam seeds were taken from one plant-near Ann Arbor, 
Michigan in 1936. These were planted in 1937 in the greenhouse at Michigan State College and 
later planted outside in rows 3 ft. apart and 3 ft. apart in the rows. During the latter part of 
the summer, counts were made on plants blooming the first year of the number of branches, 
the number of inflorescences, and the number of,primary pedicels and bract per inflorescence 
on the stem and first eight branches below tlie stem terminal cluster. During the summer 


+ 


. 


Miscellanea 83 


of 1938 similar counte were made on the plants blooming the second year: The object of 
this article is to compare the counts pertaining to the annual and biennial inflorescences. 

During the first flowering season 60 % of the plants bloomed. -At the end of this season 
the plants which did not bloom appeared to be in good condition for the coming winter. 
In 1938 biennial flowers appeared much earlier than the annual flowers in 1937. Counts of 
the annuals were made in 1937 in August and September; counts of the biennials were made 
in July and the first part of August. 

The terminal inflorescences on the stem will be designated by T, the firat branch terminal 
inflorescence by A, the first non-terminal inflorescence on the first branch by ΑἹ, etc. Branches 
are considered in descending order below the stem terminal. According to these notations, 
D, represents the third non-terminal mflorescence on the fourth branch. Umbels in this 
article will always mean primary umbels and pedicels or rays will always mean primary 
pedicels or rays. 


SIZE OF ANNUAL AND BIENNIAL PLANTS 


The following averages pertain to the number of branches and inflorescences (including 
buds) of annual and biennial plante. 











. Annuals Biennials 

Parts (1937) (1938) . 
- Average no. of branches ; 16-3 20:3 
Average no. of inflorescences 122-7 282-6 





These averages indicate that the biennial plants were much larger as to number of 
branches and inflorescences than the annual plants. The second year herbs were considerably 
taller than those blooming the first season. 

In 1938 most of the branches used in making the counts had four umbels, whose parts 
could be enumerated; in 1937 very few of these had four umbels which were mature enough to 
use. Very few of the first branches belonging to 1937 plants produced more than two non- 
terminal umbels; a good percentage of corresponding branches of 1938 plants possessed more 
than two. Counts were made on seventy-seven plants during the first season and on seventy- 


six during the second. In the second summer there were several plante with more than 600 - . 


inflorescences and one with 796; the largest 1n. 1937 had 183. 

In 1987 there was 37-7 % of the herbs with at least eight branches; in 1938 there was 
12-4 % with at least eight similar branches. In the first flowering season 46-8 % of the plante 
had at least six branches; during the second season 90-8 % had at least six branches. There 
were 74:0 % of the annual plante with at least four branches and 97-4 % of the biennials with 
at least four. These figures show that the biennial plants were more completely filled out than 
the annuals. - 


SIZE OF INFLORESOHNOES 


Table 1 contains averages pertaining to the number of bracts per umbel on the stem and 
the firat three branches. On the average the number of bracts on the stem and branch termmal 
clusters of biennials are significantly larger than simular annual clusters. The average size 
(in number of bracts) of stem umbels for annuals was 10-9 bracts; that for biennials was 11-9 
bracts. The average number of bracts per branch terminal was less than 10-3 bracts during 
1987 and greater than 11-4 bracts during 1938. These figures and figures pertaining to the 
first eight branches indicate that the averages of the number of bracts on the majonty of . 
the biennial clusters are significantly larger than similar averages with pa to annual 
clusters. 


84 - Miscellanea 


Table 1. Averages and standard deviations pertaining to the number of bracts 
per umbel on the stem and first three branches : 


_ 1937 














11 55 18 — |76 59 49 — 168 
9-9 94 | 86 — |104 | %5 | 95 | — |103 
1:39] 1-22} 101] — 1:66ἱ 1:38] 1-30) — 1-42{ 1-21] L27| — 














1938 





Number 76 76 33 27 A6 30 29 15 

Average 11-9 | 11-4 9-8 110-3 |110 | 11-6 | 10-1 | 10-7 | 11-0 

Standard 1-35} 1:90! 1-43! 1-53) 1-77) 1:20) 1-41) 1:44! 1-16 
deviation 


-1 


7l 
11-9 


110, 1:32] 1-45) 1-23 








30 25 21 
10-2 |109 | 11-2 





Biennial stem terminals had on the average significantly more pedicels than stem 
annuals; these averages are: 
Annuals Biennials o 
55-6 pedicels - 67-8 pedicels = 
Branch terminals of biennials have on the average significantly more rays than similar ones 
on annuals. Fig. 1 allows the eye to see at once how these averages compare; the heights of , 


the bars on the left represent the averages for the annuals. The bars on the left are shorter 
in every case. 


70 


5 B 5 


ο 
e 


Number of pedicels 
Ὁ 


-RISSE 

ο REM 
- SSS 
-IENSSSSSSSSSSSSSSSS 
-ASSY 
SSS 
= (WS 
“RR 
ος 
SSS 
ο RR 

- SSS 


to 


4 σ D E F G 


Fig. 1. Averages pertaining to number of pedicels per branch terminal 
umbel for annuals and biennials. c, annuals; b, biennials. 


X 


Many of the averages of the numbers of pedicels on non-terminal biennial clusters are 
significantly larger than those belonging to corresponding annual clusters. The above indi- 
cates that biennial 1nflorescences (in number of bracts and pedicels) are significantly larger 


Miscellanea - 85 


΄ \ 
than corresponding annua] inflorescences, showing again that Daucus carota herbs blooming 
the second season are on the average much larger than those blooming the first. 

On examining average number of pedicels per umbel ıt is found that branch non-termmal 
umbels have on the average a smaller number of pedicels than the corresponding branch 
terminals; for example, A, and A, are significantly less (in number of rays) than A, This was s 
true for the other branches. The 1938 averages for C, Οι, C, and C, are as follows: 


C Οι ο. - 0, 
61-7 rays 45-3 rays 19-3 rays 51-1 rays 


Similar figures were found for the other branches. The averages pertaining to pedicels are 
shown in Table 2. . i 


Table 2. Averages and standard deviations pertaining to the number of pedicels 
per umbel on the stem and first three branches 


1937 
































^ 1938 














49-32 | 51:14 





48-43 | 60-77 | 47-23 | 49-45 
8-79 | 10-58) 9-93) 10-31 


Average 
Standard 
deviation 













10-60 | 11-12! 11-62 




















Ἐς te $ { 
CORBELATION 


The Pearson linear correlation coefficient between the number of bracts and the number 
of rays for various umbels for annual and biennial umbels are as follows: _ 








a. 





Umbel ... (mp A B σ 
Annuals 0-408 - 0-410 0-496 0-571 
Biennials 0-655 0.619 0-590 0-549 








All of these coefficients are significantly different from zero, showing that there is a definite 

relation between the number of bracts and the number of rays. There are no significant dif- 

ferences between the correlation coefficients pertaming to annuals and biennials except that 

for T which is barely significant at the 5 % level. These values suggest that the size of the 

plant and season do not effect the relation between the number of bracts and rays per umbel. 

Similar figures were found in other investigations of this species (Baten, 1934). The position 
_ of the umbels on the plant also does not affect the relation between bracts and rays. 


86 Miscellanea 

The coeffioiénte of correlation between the number of bracts on T and on the other 
clusters pertaining to annuals and biennials are about the same and are significant, indicating 
a real association between bracts on stem and branch terminals and similarly for rays. The 
values of rpg are: 





i - Bracts Rays 





‘Annuals Biennials Annuals Biennials 








; rrp 0-644 0-596 0-849 0-741 








Size of herb and season have no effect on the. relation between bracts and rays ç on T and on 
A, B and C. 

* The amòunts of dependence of the number of bracts on branch non-terminals have on 
the number of bracts of terminals for the first three branches were obtained by the correlation 
coefficients between these respective numbers. There are no significant differences between 
these coefficients, suggesting that the size of plants and seasons do not affect the relation 
between the number of bracte and rays on branch terminals and branch first non, terminals: 
This also was true for rays. 

. The following figures are the coefficients of correlation between the number of bracts and” . 
rays on first and second branch terminal inflorescences. | 


Bracts Rays 


Annuals | Biennials | Annuals | Biennials 


r 4n (interclasa) 0-748 0-734 0-916 0-856 
fap (intraclass) 0-701 0-710 0-896 0-786 
` ——— 7 η L 
These values indicate no significant differences between the correlations pertaining to annual 
and biennial inflorescences. They do suggest a rather high correlation between the number of _ 
bracts and rays on first and second branch primary umbels. 
The relation between fiorali parts on B and T and A is manifested by the following 
multiple and partial correlation coefficients. 





























Bracts Rays 
Description 
Annuals Biennials Annuals Biennials 
fue ^ 0-807 ` 0-796 0-683 . 0-871 ^ 
TBAT 0-636, - 0-656 0-662 ' 0680 











Again there are no significant differences between. these coefficients indicating that the 
amount of relationship remains the same between these floral parts pertaining to annual and 
biennial inflorescences. 


«- 


Μ iscellanea à 87 


SUGGESTIONS FOB FURTHER STUDY à . 


It might be argued that biennial plante should naturally be larger in every way since 
these plante had a loger time in which to establish themselves than the annuals; that the 
root syster of the second season plants are much better for supporting the plants than those 
of the first season. This may be true. To overcome this criticism and to make more reliable 
comparisons between annual and biennial plante and inflorescences'/it might prove of real 

, value to secure seeds from one plant as done in this study, save seeds from the annual and 
biennial flowers, and plant these seeds under the same environmental conditions and then 
rhake comparisons between the counts made in this study. Seeds should be planted in the 
fall and in the spring. Investigations along these lines may produce more interesting results: 


- 


SUMMARY 


^ 


This study has shown that: 

1. The average number of branches on biennial inflorescences of Daucus carota is larger 
than the average number on annual inflorescences. 

2. The average number of inflorescences on biennials is larger than on annuals. 

3. The average number of bracts per biennial clusters is larger than the average on annual 
clusters. 

4, The average number of primary rays on biennial umbels is larger than that on annual 
umbels. 

5. The correlation coefficient between bracts and rays is about 0-60 for annual. and 
biennial clusters. 

6. The size of plants and seasons (first and second) do not affect the amount of correlation 
between floral parts on stem terminals and branch terminals. 
: 7. The amount of correlation between certain floral parte on branch terminals and non- 

τα is about the same for annuals as biennials. ~ 
. The amount of correlation between bracts and rays pertaining to first and second 

Ns primary umbels is about the same for annuals aa biennials. 


REFERENCES ' 
Baren, W. D. (1933). “A statistical study of Daucus carota L.” Biometrika, 25, 186-95. 
— — (1934). “A statistical study of Daycus carota L. IL" Biometrika, 26, 443-68. 





9 


BOOKS RECEIVED : 


Modern Machine Calculation with the Facit Machine Model La. By H. SABIELNY;, 
translated and revised by L. J. Compre and H. O. Hartiry. London: 
Scientific Computing Service, Ltd., 23 Bedford Square, W.C. 1. 1939. 
Price 5s. 

Statistical Method from the viewpoint of Quality Control. By WALTER A. SHEW- 
HART; edited by W. Epwarps Demne. Washington: The Graduate 
School, Department of Agriculture. 1939. 


~ Theory of Probability. By HAnorp JarrgEYS, F.R.S. Oxford: At the Clarendon 
Press. 1939. Price 21s. 3 


The Probability Integral. By the late W. F. SggPPARD, F.R.S., being Vol. vm 
of the British Association’s series of Mathematical Tables. Completed and 
edited by the Committee for the Calculation of Mathematical Tables. 
Cambridge: At the University Press. 1939. Price 88. 6d. 


The Races of Europe. By CARLETON Stevens Cook. New York: The Macmillan 
Company. 1939. Price 31s. 6d. 

Tuberculosis and, Social Conditions in England, with special S to YA 
Adults. (A statistical study.) By P. D’Aroy Harr and G. PavrirNG 
Wzaiemr. London: National peecouraon for the Prevention of Tuberculosis. 
1939. Price 3s. " 

Statistical Procedures and their M athematical Bases. By CHARLES C. PETERS and 


Water R. van Voonuis. New York and London: McGraw-Hill Book 
Company, Ino. 1940."Price $4.50. 


ose Ua ds 


^ 





" Vorvux'XXXIL Pants III axp IV 









A CUTS 


^ (UA 
y » "7 "s edF. 


MEDICAL STATISTICS FROM GRAUNT TO FARR 
(Continued *) 


MI 


ναό D/L 








BY MAJOR GREENWOOD 


II. THE STATISTICAL WORK OF GRAUNT 


Joun GnRAUNT's contribution to our subject has alwáys been regarded as one 
of the great classics of science. A few have indeed doubted whether so great 
a work could have been achieved by one whose material success was so modest 
and have sought to transfer the glory to Graunt's highly successful friend Petty. 
This dispute I relegate to an appendix. I assume that Graunt’s published book 
is substantially his own original work. 

The history of the material Graunt used has been written more than once 
and I have nothing to add to Prof. Hull’s story. Graunt had, for a period of more 
than 60 years, arithmetical statements of the numbers of males and females 
christened and buried and of the causes óf death (not distinguished by sex). 
under some sixty headings. He had no information as to the ages at death. He 
had no information as to the number or ages of the living population. 

The first act of a scientific statistician is to assess the trustworthiness of his 
data, to criticize his sources. This tedious preliminary to the doing of sums was 
not much to Petty’s taste. Petty, as we have seen, often used different data to 
reach some conclusion, but hardly ever discusses the reliabilities of the several 
data. Other Fellows of our College since Petty’s day have made the same 

- mistake. The terrible.‘ howler’ committed by Dr William Heberden the younger, 
and detected, not without satisfaction, by Charles Creighton is classical.t But 
that was not a unique instance. Indeed, even trained statisticians sometimes 
confuse names with things. More than one rate of mortality has risen (or fallen) 
only on paper. Graunt made no such mistakes. à 

Graunt’s general argument is that many causes of death are ‘but matters of ` 
sense’, for instance, whether a child were abortive or stillborn, and that in many 
cases the searchers are ‘able to report the opinion of the physician, who was 
with the patient as they receive the same from the friends of the defunct’. But 
sometimes the searchers will be wrong and often enough the error will not matter. 


‘As for consumptions, if the searchers do but truly report (as they may) whether the 
dead corpse were very lean and worn away, it matters not to manyof our purposes whether 
the, diseases were exactly the same, as physicians define it in their books. Moreover, in 
case a man. of seventy-five years old died of a cough (of which had he been free, he might 


* The earlier sections were printed in Biometrika, 32, 101-27. 
. + Creighton, History of Epidemics in Britain, 2, 747-8. Heberden supposed (erroneously) that 
.'Griping of the Guts’ of the Bills was Dysentery and had decreased. It was Infantile Diarrhoea 
and had simply been transferred to the rubric ‘Convulsions’. 


Biometrika xxxn . “14 


204 Medical statistics from Graunt to Farr 
have possibly lived to ninety) I esteem it little error (as to many of our purposes) if this 
person be in the table of casualties, reckoned among the aged, and not placed under the 
title of coughs (348).* a 
No doubt this brutal common sense might set on edge the-teeth of some 
Fellows of the College of Physicians even in the seventeenth century, but it was 
one of the qualities which made Graunt a pioneer. Making the best the enemy 
' of the good is a sure way to hinder any statistical progress. The scientific purist, 
who will wait for medical statistics until they are nosologically exact, is no wiser 
than Horace’s rustic waiting for the river to flow away. 

Graunt, however, did not accept statements which he had the means of 
testing.. Finding in a series of years that of more than a quarter of a million 
deaths only 392 were assigned to the Pox, he did not infer that Syphilis had 
been over-rated as a cause of death. 

Forasmuch as by the ordinary discourse of the world it seems a great part of men have, 
at one time or other, had some species of this disease, I wondering why so few died of it, 

- especially because I could not take that to be so harmless, whereof so many complained 
very fiercely ; upon enquiry, I found that those who died of it out of the hospitals (especially 
that of Kingsland, and the Lock m Southwark) were returned of ulcers and sores. And in 
brief, I found, that all mehtioned to die of the French Pox were returned by the clerks of 
St Giles’ and St Martin's m the Fields only, in which places I understood that most of the 
vilest and most miserable houses of uncleanness were: from whence I concluded, that only 


hated persons, and such, whose very noses were eaten off were reported by the searchers 
to have died of this too frequent malady (350). ‘ 


In principle, the argument is still valid. 

` His next example of criticism is the case of Rickets, which first appeared in 
the Bills of Mortality in 1634 and then with 14 deaths only, but by 1659 had 
risen to 441. Was Rickets a ‘new disease’ or did an old disease receive, in the 
Bills, a new name? 


To clear this difficulty out of the bills (for I dare venture on no deeper arguments) 
I enquired what other casualty before the year 1634, named in the Bills, was most like the 
rickets; and I found, not only by pretenders to know it, but also from other Bills, that 
livergrown was the nearest. For in some years I find livergrown, spleen, and rickets, put 
all together, by reason (as I conceive) of their hkeness to each other. Hereupon I added 
the livergrowns of the year 1034, viz. 77, to the rickets of the same year, viz. 14, making 
in all 91; which total, as also the number 77 itself, I compared with the livergrowns of ‘the 


E 


precedent year 1696, viz. 82. All which showed me, that the rickets was & new disease over ΄ 


and above. Now, this being but a faint argument, I looked both forwards and backwards, 
and found that in the year 1629, when no rickets appeared there were but 94 livergrowns; 
&nd in the year 1636 there were 99 livergrowns, although there were also 50 of the rickets: 
only this is not to be denied, that when the rickets grew very numerous (as in the year 1660, 
viz. 621) then there appeared not above 16 of livergrown. In the year 1659 were 441 rickets 
and 8 livergrown; m the year 1658 were 476 rickets and 51 livergrown. Now though it be 
granted that these diseases were confounded in the judgment of the nurses, yet ıt is most 
certain that the livergrown did never but once, viz. anno 1630, exceed 100; whereas anno 


* Numbers in brackets are page references to Prof. Hull’s edition of The Economic Writings of 
Sir Wilham Petty together with the Observations upon the Bills of Mortality more probably by Captain 
John Graunt, Cambridge, 1899. 


. 


MAJOR GREENWOOD Ἢ 205 


_ 1660, livergrown and rickets were 536. It is also to be observed, that the rickets were never 

more numerous than now, and that they are still increasing; for anno 1649, there were but 
190, next year 260, next after that 329 and so forwards, with some little starting backwards 
in some years, until the year 1660, which produced the greatest of all (357-8). 

This is an excellent statistical argument, and, incidentally, evideñce that 
Graunt wrote his own book, for a physician would probably have suggested that 
the professional interest excited by the classical treatise of Glisson (assisted by 
Regimonter) which was published in 1650 might easily have increased the 
popularity of the diagnosis. Petty, who, with Glisson, was a founder of the 
Royal Society, would hardly have ignored his colleague’s work. 

I cannot resist-the desire to mention others which, while of little statistical 
importance, have a medical attraction. Graunt noticed that Stopping of the 
Stomach first appeared in the Bills of 1636, increased from 6 to 29 by 1647, by” 
1655 it reached 145, in 1657, 277 and 1660, 314. First he conjectured that 
Stopping of the Stomach might be the Green Sickness, *forasmuch as I find few 
or none to have been returned upon that account, although many be visibly 
stained with it’. He thought that possibly Green Sickness might not appear in 
the Bills ‘for since the world believes that marriage cures it, it may seem indeed 
a shame, that any maid should die uncured, when there are more males than 
females, that is, an overplus of husbands to all that can be wives’. Then he 
wondered whether Stopping of the Stomach might not be Mother, ‘forasmuch 
I have heard of many troubled with Mother Fits. (as they call them) although 
few returned to have died of them’. But he was diverted by guessing ‘rather 
the Rising of the Lights might be it’. He remembered that some women troubled 
with the Mother fits did complain of a choking in their throats. ‘Now, as I 
understand, it is more conceivable that the Lights or Lungs (which I have heard 
called the bellows of the body) not blowing, that is, neither venting out, nor 
taking in breath, might rather cause such a choking, than that the Mother should . 
rise up thither, and do it. For methinks, when a woman is with child, there is , 
a greater rising, and yet no such fits at all’ (359). He notes that Rising of the 
Lights increased in the Bills from 44 in 1629 to 249 in 1660. 

Finally, he suggests a correlation between Stopping of the Stomach, Rising 
of the Lights in adults and the Livergrown, Spleen and Rickets of children. 
‘And that what is the Rickets in children, may be the other in more grown 
bodies; for surely children which recover of the Rickets, may retain somewhat 
to cause what I have imagined: but of this let the learned physicians consider, 
as I presume they have’ (359). 

It might be suggested that one item under Stopping of the Stomach could 
be surgical, viz. strangulated hernia. Rupture was a heading in the Bills, but 
the numbers are small and show no regular increase with the increase of popu- 
lation. Graunt’s attraction to what used to be -called hysterical stigmata is 
interesting. One wonders how'far these passages reflect conversations with 
Petty. It is clear that Graunt had no belief in the peripatetic uterus; Petty κ 


14-2 


΄ 


206 Medical statistics from Graunt to Farr 


would have had none. The best medical opinion of the age is, of course, that of 
Sydenham. Sydenham (whose pathology was traditional) had a pneumatist 
aetiology of Hysteria, the origin was an ataxia of the animal spirit (which was 
the pneuma zotikon of ancient tradition). He not only believed that Hysteria 
might be a serious or even mortal complication of organic disease—as we do 
still—but that the ataxic spirits might themselves produce humoral corruption 
and lead to chlorosis or ovarian dropsy (Dissertatto epistolaris, 92). So there is 
nothing repugnant to the best professional opinion of the age in admitting’ 
Hysteria to the list of causes of death. Nor is there any gross absurdity in the l 
suggested correlation of increasing Rickets and increasing Hysteria, from the 
point of view of a layman. But that surmise does not imply any professional 
` hint, it rather suggests a belief in a merely physical factor, the pressure of an 
enlarged organ. That passage would not. have been written by a physician. 

These are sufficient instances of Graunt’s criticism of sources—the temptation 
to go on quoting examples must be resisted. I pass to his great achievement, 
the estimation of rates of mortality at ages when the numbers and ages of the 
living were not recorded. For such an estimation to be correct, we all know that 
the population must be stationary, viz. non-increasing, not subject to migration 
and having constant rates of mortality in the several age groups. 

It is a nice point whether Graunt or Petty appreciated the importance of 
these considerations. Graunt was certainly alive to the fact that the population 
of London was growing and. that the growth was due to immigration from the 
country. The arithmetical position was this. In the earlier years of his series 
burials and christenings were about equal in numbers, in 1605 there were 5948 
burials and'6504 christenings; in 1625, 7850 burials and 7682 christenings, in 
1696, 10,651 burials and 10,034 christenings. Later the burials continued to 
increase, but the christenings either decreased or failed to increase in the same 
proportion. This Graunt attributed to neglect of christening owing to religious 
dissidence and gave excellent reasons for his view. It is clear then that there 
were two factors of increase, immigration and increasing numbers of births. 
Most of Graunt’s deductions are based upon an analysis of the deaths by causes 
for twenty years, 1629-36 and 1647-58, which he selected as years comparatively 
unaffected by plague (of his total of 229,250 deaths only 16,000 were from 
plague). 

If we treat this total as a denominator (or one-twentieth of it) it will, from 
the point of view of calculating mortality ratios, be affected by two errors. The 
deaths of immigrants will make it too large and the increasing births will make 
it too small. Can it be Graunt held that the errors balanced so that, arithmetically 
speaking, one might behave as if one were dealing with a stationary population? 
An alternative explanation is that Graunt did not realize the limitations of the 
method. 

A third possibility is that, although he knew the fallacy, he believed that 


ἊΝ 


MAJOR GREENWOOD .. 207 


the incorrect method gave an approximation to truth sufficient for his purposes. 
This is the solution I should be inclined to adopt were I forced to choose. 

As I have' pointed out above, there is at least a suggestion that Petty did 
have some glimmering of the conditions to be fulfilled if a summation af deaths 
is to give a correct view of rates of mortality. I do not believe that Graunt was 
less informed on any point of vital statistics than Petty. However, all this is 
guess-work. 

Graunt did not ku the ages of the dead; what he did was to pick out 
of the list of causes of deaths those which he thought lighted only upon children 
*not more than four or five years old’. He chose Thrush, Convulsions, Rickets, 
Teeth and Worms, Abortives, Chrysomes, Infants, Livergrown and Overlaid. 
These gave him some 70,000 out of some 229,000. Then he assigned half the 


deaths from Small Pox, Swine Pox, Measles and Worms without Convulsions f 


also to children under six and reaches the final conclusion that about ‘36 % of 
all quick conceptions die before six years old’. 

Is this conclusion—I will not say correct, because we have no data to reach 
a correct result—but of a reasonable order of magnitude? The answer is that 
it is eminently reasonable. Two hundred years after Graunt’s death, William 
Farr printed (in the famous Supplement to the 35th Annual Report of the Registrar- 
General, p. cxxxvi) an outline Life Table for London. This was, of course, com- 
puted by an approximately correct method, using knowledge of the numbers 
-and ages of the living population, and reflects the conditions ‘of seventy-five 
years ago. Interpolating in this we find that about 32% of ‘quick conceptions 
died before six years old’. There is no good medical reason for holding that the 
conditions of child life in London in the middle of Victoria’s reign were much 
better than in the seventeenth century. The old genius used a bow with a frayed 
string and made no allowance for windage, but his arrow hit the target not far 
from the white. He gave the first quantitative measure of the Herodian sacrifice 
in towns, a sacrifice which was to continue to be offered for more than 200 years. 

Graunt then passed to the other end of life and found that 7 % of the dead 
were ‘aged’. He conceived that the searchers would mean by ‘aged’ persons of 
70 years or upwards, ‘for no man be said to die properly of Age who is much 
less’: His following suggestion that the proportion living beyond 70 might be 
used as a measure of healthfulness is not happy. But this calculation may have 
led him to make, or insert, the most famous passage in his book, viz. what is, 
in form, the first Life Table ever published. 

Whereas we have found, that of 100 quick Conceptions about 36 of them die before 
they be six years old, and that perhaps but one surviveth 76; we having seven decads 
between six and 76, we sought six mean proportional numbers between 64, the remainder, 
living at six years, and the one, which survives 76, and find that the numbers following 


are practically near enough to the truth; for men do not die m exact ολ aa nor in 
fractions, from whence arise this Table following (386). 


Graunt’s figures are 100, 64, 40, 25, 16, 10, 6, 3, 1. 


208 Medical statistics from Graunt to Farr 


The one survivor to 76 is, as Graunt implies, a guess; perhaps he conjectured 
that his seven survivors beyond 70 died one a year. How he calculated his mean 
proportional numbers is unknown. Prof. Willcox conjectured that he experi- 
mented with multipliers of 5/8 and 2/3—the former nearly reproduces the figures 
(see Willcox, Revue de lInst. Intern. de Statistique, 5 (1937), 327). Ptoukha 
(Congrès Intern. de la Population; Démographie historique, p. 71, Paris, 1937) . 
ingeniously suggests that he used the multiplier (64 — 1)/100 or 0-63. 

We must, I fear, conclude sorrowfully that this shot did not find the bull's 
eye. If Graunt’s survivors are compared with those shown in Halley's table 
(when correctly used, vide infra), for 100, 64, 40, 25, 16, 10, 6, 3, 1, we should 
have 100, 56, 50, 45, 38, 31, 22, 14, 6. It is possible that child mortality was 
lower in London than in Breslau, but quite incredible that later age mortality 
should have been so enormously higher. 

But, of course, having regard to the data, it would have ‘been more than 
genius, it would have been magic, Had a correct result been obtained. 

Prof. Willcox, whose opinion of Graunt is almost as high as mine, regards the 
passage as inserted on the recommendation of Petty and as Petty’s composition. 
He thinks that it lacks Graunt’s caution and suggests the flighty ingenuity of . 
his friend. Prof. Willcox’s arguments are weighty, but I am not convinced. 
That Graunt did not—to use the expressive slang—feature his table is true. It is 
also true (vide supra) that passages in Petty’s undoubted writings imply that 
he had some conception of a survivorship table. But—and this is my main 
difficulty—if this were Petty’s idea, I find it difficult to believe that he would 
not have exploited it. Halley, whose economic scent was not so keen as Petty’s, 
saw the epoch-making importance of an idea which was to transform the business 
of selling annuities. It would be odd if Petty had seen it that he did not comment 
- upon it. Graunt might well have hesitated, being a cautious statistician, but 
surely not Petty. 

However, in spite of modern practice, the writing of history wholly in terms 
of psychology has its pitfalls. 

Let us return to simpler applications of shop arithmetic. The advantages of 
country life over town life from the point of view of both mortality and morality 
had been a commonplace of poets, particularly those Roman poets who spent 
much of their lives in a city, long before the seventeenth century. Graunt was 
the first to apply an arithmetical test of mortality; he compared the statistics 
of Romsey with those of London. For Romsey he had ninety years’ data of 
marriage, christenings and burials. 
^ His statements about the population of the parish are not quite consistent. 
In one sentence he says that it ‘both 90 years ago, and also now, consisted of 
about 2700’, but a few lines later says ‘it neither appears by the burials, 
christenings, or by the built of new housing, that the'said parish is more populous 
now, than 90 years ago, by above two or 300 souls’. A little-later he says ‘it is 


MAJOR GREENWOOD ` ' 209 


clear that the said parish is increased about 300, and it is probable that 3 or four. 

hundred more went to London; and it is known that about 400 went to New 

England, the Caribe Islands and Newfoundland within these last forty years’ 

(389). Actually, from an estimate of the number of communicants (which he 

assumes to be rather more than half the total population) he makes the average 

. population between 2700 and 2800. Taking the average of burials for the whole 

period to be 58, this gives him a death rate of a little more than one in 50, 

which he contraste with the London figure of one in 32 (apparently based on his 

. count of 11 families with 88 persons amongst whom 3 deaths occurred in a year; 
but this is a rate of one in 29). k 

There is no doubt a certain sketchiness about this, but it was not unreasonable 
to infer that the Romsey rate was much lower than the London rate. 

Graunt found that, unlike London, Romsey had an average excess of 
christenings over "burials, they were in the ratio of 5 to 4. He estimates that over 
the period the natural increase was 1059, and, as will be seen from the quotation 
made, he allots about a third of this respectively to London, to the colonies and 
to the parish itself. He argues that supposing the population of all.England to 
be fourteen times that of London and other parishes to send one-third of their 
natural increase to London, then the London burials should increase about 
200 per annum ‘and will answer the increase we observe’. 

Here again the argument is reasonable. He goes on to an investigation which 

-has been severely criticized. He gives a table of the greatest and least number 
of burials in each of the ten-year periods for which he has data. In each decade 
but one the maximum is more than twice the minimum. But, he remarks, in 
no decade in the London experience is the largest number of burials twice the 
smallest number (he excludes deaths from plague from his statistics). ‘Which 
shews, that the opener and freer airs are most subject both to the good and bad 
impressions, and that the fumes, steams and stenches of London do so medicate 
and impregnate the air about it, that it becomes capable of little more, as 
if the said fames rising out of London met with, opposed and jostled back- 
wards the influences falling from above, or resisted the incursion of ms country 

airs’ (392). 

Prof. Hull shook his head over this passage. ‘This is, an attempt to explain 
by physical conditions the wide range in the observed country death rate which 
is really due to the narrowness of the field—a single market town—under 
investigation. It is perhaps the dba statistical mistake that can be charged 
against Graunt' (Ixxvii). 

I do not like to leave a hero in the lurch. I must ἀν οἷς that if both Romsey 
and London burials were samples from a Poisson universe, the fact that the 
Poisson parameter for London was at least a hundred times that for Romsey 
would make it incredible that the London range, in terms of the mean or of the 
standard deviation, should be so wide as that for Romsey. But Prof. Hull was 


~ 


210 Medical statistics from Graunt to Farr 


wrong in supposing that the wide range in the Romsey rates was Sun to the 
narrowness of the field of observation in a statistical sense. 

Taking Graunt’s 58 as the ‘éxpected’ annual deaths then, as 1 [58 is small, 
the Poisson distribution is not far from the symmetry of a normal curve, and 
using the results of Tippett and E. S. Pearson, we may conclude that the' 
expected range would be 23-45 + 6-073. The observed ranges for the successive 
decades are 32, 48, 78, 23, 65, 39, 121, 91, 52. All but one is greater than the 
expectation and six diverge by more than three times the standard error. 

Something more than sniall numbers is involved. Still, it must be confessed 
that Graunt did not anticipate the reasoning of. James Bernoulli, although an 
intuition of genius may have led him to think that something more than ‘chance’ 
had play here. 

Graunt devoted special attention to the demographic influence of the plague. 
In the first place, he remarks that the attribution plague understated the 
mortality due to plague. He infers this from the fact that in plague years burials 
from other causes exceeded the average greatly, ‘from whence we may probably 
suspect, that about 1/4 part more died of the plague than are returned for such’. 
Next he inferred that after a great outburst plague lingered for several years. | 

The plague of.1636 lasted twelve years, ın eight whereof there died 2000 per annum 
one with another, and never under 300. The which shows that the contagion of the plague 
depends more upon the disposition of the air than upon the effluvia from the bodies of 
men. Which also we prove by the sudden jumps which the plague hath made, leaping in_ 
one week from 118 to 927; and back agam from 993 to 268; and from thence again the very 


next week to 852. The which effects must surely be rather attributed to change of the air, 
than of the constitutions of men’s bodies, otherwise than as this depends upon that (366). 


Finally, he observes that within two years the city was re-peopled; a deduction 


_ from the time taken for the number of christenings to reach again the level of 


a pre-plague year. > 

We may, if we please, smile at Grauntis epidemiological inference. But it is 
a reasonable inference from the facts when we remember that in Graunt’s day— 
in spite of Fracastorius—contagium was not thought of as contagium vivum, 
but as a mere sympathetic vibration or passing on of something. 

I have, I hope, given an adequate sample of Graunt’s quality, but have not 
mentioned the most famous of all his deductions. Both in London and the 
country, on the average more males were christened than‘ females, but more 
males died young or entered celibate occupations. So we reach this conclusion: 


We have hitherto said, there are more males than females; we say next that the one 
exceed the other by about a thirteenth part. So that although more men die violent deaths 
than women, that is, more are slain in wars, killed by mischance, drowned at sea and die 
by the hand of justice; moreover more men go to colonies and travel mto foreign parts 
than women; and lastly, more remain unmarried than of women as fellows of colleges, 
and apprentices above eighteen etc. yet the said thirteenth part difference bringeth the 
business but to such a pass, that every woman may have an husband, without the allowance 
of polygamy (375). 


MAJOR GREENWOOD 211 


The story of how this arithmetical justification of God's providence attracted 
the attention of Derham, of how Derham’s book fired the enthusiasm of the 
Prussian Army chaplain Johann Peter Siissmilch and of how Siissmilch’s book 
influenced Malthus has been well told by Hull. I should not myself rank this 
section high among Graunt’s researches. From a demographic point of view 
neither judicial hangings nor college fellowships could have had much effect in 
reducing the male excess. 2 : 

Even copious quotation fails to convey the spirit of a complete book. I have 
quoted good things, but many more remain. Graunt revealed sundry important 
truths and not the least important, was that very imperfect data, if patiently 
considered, will tell us something it is good for us to know. If young medical 
officers going to parts of the empire where organized medical and demographical 
information is at no higher a level than that of seventeenth-century England— 
and there are many such places—were restricted to a single book on statistics, 
I should advise them to take not a modern scientific work, but old John Graunt’s 
Observations. 


APPENDIX 
Did Graunt write the book published over has name? 


John Graunt and William Petty were, 88 we have seen, close friends. Αιρ 
as the world judges success, failed and Petty succeeded. But by the judgment 
. of scientific men in the seventeenth century, and ever since, the order of intel- 
lectual precedence was reversed. From the moment of publication & few 
discerning people perceived the originality and importance of the Observations, 
the same people who, while admiring Petty’s verve, ingenuity and worldly 
success, did not take over-seriously his bright ideas. 

But Graunt was a man of one book. Save & note upon the multiplication of 
carp and the growth of salmon, he published nothing more. Petty went on 
writing, scheming and talking for thirteen years after Graunt’s death. That often 
enough in that period, the Observations were discussed over the wine—as they 
were quoted in Petty’s writings—we may suppose. That Graunt discussed his 
work with Petty both before and after publication we may also take for certain, 
although we have no formal proof of it. The country statistics which Graunt 
first used were from Petty’s native parish and even if we are not disposed—as 
: certainly I am not—to give much weight to particular turns of phraseology, still 
there are sufficient verbal oddities in some pages of Graunt’s book to suggest 
Petty’s hand. 

In these circumstances, it would not be ery surprising if Petty's associates, 
particularly those who were not good judges of statistical work, were to conclude 
that Petty’s share in the remarkable achievement of Graunt were greater than 


212 Medical statistics from Graunt to Farr 


appeared.: It is not even judging Petty too harshly to suppose that he himself 
might come to share the opinion. There is no evidence that Petty ever did 
' explicitly claim the credit. In one list of his writings (one of four), found among 
Petty’s Papers, he did include the Observations, which at least is evidence that 
he thought himself entitled to a share of the credit. We may suppose that if, 
in familiar intercourse, somebody had said ‘Come, confess Sir William, yours 
was the hand that guided the pen of poor John Graunt’, he might not have 
denied it very strenuously. I think I have produced evidence enough that Petty 
did not under-rate his powers and was not conspicuous for delicacy of feeling. 

My guess would be that long before his death he did come to believe that 
Graunt’s intellectual success was due to his help. 

Whether Petty believed this or not, it is certain that Ands and associates 
of Petty began to believe it soon after Graunt’s death, and the belief has been 
entertained by a few people in each generation since. These, with one con- 
spicuous exception, have been drawn from Petty’s friends or descendants or 
from literary critics: 

In the seventeenth century, of Petty’s circle, Evelyn, Southwell and Aubrey 
believed or said that Petty wrote or inspired Graunt’s book. Two Fellows of the 
Royal Society, Houghton and Halley, also attributed the.book to Petty. The 
only one of the five who was certainly a competent judge of scientific merit was 
Halley. Halley began the memoir which contains his Breslau table with these : 
words: 


The contemplation of the mortality of mankind has, besides the moral, its physical and 
political uses, both which have some time since been most judiciously considered by the 
curious Sir William Petty, in his moral and political Observations upon the Bills of Mortality 
of London, owned by Captain John Graunt. And since in a like treatise on the Bills of 
Mortality of Dublin....But the deductions from those bills of mortality seemed even to 
their authors (sic) to be defective. (Phil. Trans. no. 196 (1693), p. 596.) 


Since the seventeenth century, there has been unanimity among demo- 
graphic statisticians and economists that Petty could not have written Graunt's' 
book. Halley was quite as good a judge of scientific merit as any of them and 
a contemporary of the canvassed writers; if I were sure that he had read and 
compared Petty’s acknowledged works with the Observations I should prefer his 
opinion to that of other ‘experts’—including, of course, my own. Halley’s 
direct testimony, in the sense of a court of law, would be valueless; he was only 
six years old when the Observations were published and became a Fellow of the 
Royal Society five years after the death of Graunt. There is no evidence that 
either before or after the period of writing and publishing his famous memoir, 
Halley worked on demography. After his memoir, but in his lifetime, a new 
epoch in mathematical vital statistics began. De Moivre, eleven years younger 
than Halley, brought out his principal works in the lifetime of Halley (1656— 
1742) and used Halley’s table. The two men must have been well acquainted, 


-- 


` 


; MAJOR GREENWOOD 213 


for both were enthusiastic disciples and intimate | friends of Newton,’ but Halley, 
like Graunt, made only one contribution to the literature of demography. 

So it may be doubted whether Halley were sufficiently interested in demo- 
graphic or economic writings to have read Petty’s tracts at all. Alsq in the 
passage cited above (apart from the writing of ‘authors’ not ‘author’) the collo- 
cation of the Observations on the Dublin Bills with those on the London Bills is 
curious. There is no doubt that the Observations on the Dublin Bills were the 
work of Petty, and in the first edition of them they are stated on the title-page 
to be by the ‘Observator on the London Bills of Mortality’. But this, as Prof. 
Hull pointed out (xlii, was probably a catch-penny device of the publisher, 
Mark Pardoe, to draw a public which had just taken a fifth edition of the London 
Observations. Actually the book did not sell, and when the publisher reissued an 
enlarged version, Petty’s name appeared on the title-page without any reference 
to the London Observations. l conclude that Halley’s evidence is less weighty 
than it seemed. He may well have had before him copies of Graunt’s book and 
of the two editions of the Dublin Observations. Having no other knowledge of 
the literature he would naturally enough write as he did. 

If we eliminate Halley, no other expert countenanced Petty’s authorship and 
one, Augustus de Morgan, gave an amusing but quite cogent reason for dis- 


missing the notion. 


In speaking of the variations in the annual numbers of deaths attributed to- 


Rickets, Graunt said: ~ ° - 


Now, such back-starting seem to be universal in all things: for we do not only see in the 
progressive motion of wheels of watches, and in the rowing of boats, that there-is a little 
starting or jerking backwards between every step forwards, but also (if I am not much 
deceived) there appeared the like in the motion of the moon, which in the long telescopes 
at Gresham College one may sensibly discern (358). 


De Morgan (Budget of Paradoxes, 68; Assurance Magazine, 8, 167) commented 
on the improbability that ‘that excellent’ machinist, Sir William Petty, who 
passed his day among the astronomers’ would attribute to the motion of the 
moon in her orbit all the tremors which she gets from a shaky telescope. 

Down to 1927 the matter was regarded, in scientific circles, as settled. In 

that year the late Marquis of Lansdowne published a copious selection of the 
Petty Papers with what he regarded as new evidence in favour of Petty. 

The only new evidence of a direct kind was a manuscript list in Petty’s hand 
of his writings or projected writings which included the Observatsons. There are 
three other lists which'do not include the Observations, and if we are to suppose 
that the entry really referred to the book published under Graunt’s name, then 
we must believe that in 1685 and in 1686 Petty had forgotten his best title to 
scientific immortality. The remainder of the evidence consists of parallel passages 
and ad captandum arguments to the effect that it was more probable that a 
physician had written on questions of medical statistics than a tradesman. This 


. 


214 Medical statistics from Graunt to Farr 


publication led to a lively controversy. Of the merits of this, I, as a party to it, 
am not an impartial judge. Purely literary arguments do not appeal to me when 
the question is of scientific method. Thus, Dr L. F. Powell attached weight to : 
the fact,that Dr Johnson in conversation had attributed to Petty an observation 
(not statistical) which is made not in Petty's writings but in Graunt's book. 

In the discussion the word ‘style’ is used in different senses by the combatants. 
The statisticians are thinking of scientific method, the literary critics of verbal 
arrangement. To the former the fact that, particularly in the conclusions and 
the Appendix, Graunt’s. book has turns of phraseology which suggest Petty’s 
hand, seems of little importance. To the latter it seems very significant. 

In the article by Prof. Willcox, which I have quoted above, the controversy 
is reviewed,.and the author concurs generally with his statistical predecessors. 

Prof. Willcox does, however, differ from his predecessors in one important 
particular. He holds that the famous life table was supplied by Petty. He argues 
that this is far too conjectural to have been the work of so cautious a reasoner 
as Graunt:. 


In attempting to reconstruct its origin I have surmised that after Graunt had estimated 
that 36 per cent. of the deaths were due to children’s diseases, that they all occurred under 
the age of six, and that the seven per cent. who were reported to have died ‘aged’ died at 
over 70 years of age (at one place he says over sixty), he felt unable to go further and 
reported his difficulty to Petty, already perhaps speculating about a series of similar 
problems. 

Petty guessed at the number of survivors at the end of each dasannial age period, 
6-15, 16-26 etc. incidentally and characteristically ignoring Graunt’s theory that seven’ 
per cent. survived seventy, and assuming instead; without reason, that one per cent. survived 
seventy-six and not one per cent. eighty-six, and that the survivors at age six decreased with 
each age period in a geometrical progression approximately equal to the 64 per cent. which 
Graunt had set for the first group (326-7). 


Prof. Willeox's argument is cogent. It may be strengthened by a criticism 
of the late Prof. Westergaard (Contributions to the History of Statistics (London, 
1932), p. 23). In using this table, Graunt made a serious blunder. In order to 
estimate the number of men of military age in London he subtracted the number 
alive at age 56 from the number alive at age 16. But this simply gives him the 
number dying between those ages; what he wanted wassome average of the 1,'s. It 

is evident that Graunt was not at all clear in his mind as to how to use a life table. 
: On the other hand, if this table were really Petty’s idea, it is hard to under- 

stand why he did not exploit it. If Petty had been a Halley, the explanation 
would be obvious. The table is wrong; the conditions for the validity of the 
- method were not fulfilled. There is indeed (vide supra) some evidence that Petty 
did know what data were necessary in order to construct a proper life table. 
One seems on the horns of a dilemma. If Petty thought the table was correct 
why did he make no further use of the method? If he thought it was wrong, 
would he have urged Graunt to insert it? 


t 


- 


. 


MAJOR GREENWOOD . 215 


Although Prof. Willcox has certainly shaken my previous conViction, I still 
feel reluctant to surrender Graunt’s table to Petty. However, there may be an 
element of sentimentality in this. At least the statisticians agree that the answer 
to the question which I have placéd at the head of this Appendix is emphatically 
yes. 


REFERENCES TO THE RECENT CONTROVERSY IN OHRONOLOGICAL ORDER 


1927, LANSDOWNE, Marquis of. The Petty Papers. 

1928. GREENWOOD, M. J.R. Statist. Soo. 91, 79. . 

1928. Lanspownn, Marquis of. The Petty Southwell Correspondence, pp. xxiii—xxxii.. 
1932. LanspOWNE, Marquis of. The Times Literary Supplemenj, 8 Sept. 

1932. Brerr-Jamus, N. G. The Times Literary Supplement, 15 Sept. 

1932. GaxxNwooDp, M. The Times Literary Supplement, 22 Sept. 

1932. Lawspowmwz, Marquis of. The Times Literary Supplement, 13 Oct. 

1992. Powzrr, L. F. The Times Literary Supplement, 20 Oct. . 

1933. GREENWOOD, M. J. R. Statist. Soc. 96, ΤΘ. 

1937. Wiitcox, W. F. Revue de l’ Inst. Intern. de Statistique, 5, 321. 


IV. HALLEY'S LIFE TABLE 

The long and fruitful life of Edmund Halley (1656-1742) belongs to the: 
general history of science; of him it may indeed be said nshil quod tetigit non’ 
ornavit. He made only one contribution to our subject, but it was of first-rate 
importance. 

The circumstances of this undertaking are obscure; Halley would have 
perceived the imperfections of Graunt’s life table, but it is not known whether 
it was he who set on foot a search for better statistical material than Graunt 
had had. Inquiries were however made, and made after he had bécome a Fellow 
of the Royal Society, so it is at least possible that Halley, who had travelled 
extensively in Europe (he was at Danzig in 1679 and in Italy in 1681), suggested 
that something might be found abroad. By 1691, the King’s Librarian, Henry 
Justel, who was in touch with the Society, had been brought into communication, 
. possibly through Leibniz, with Caspar Neumann, a scientifically minded evan- 
gelical pastor of Breslau. Neumann supplied the data which Halley used. 

In 1883, J. Graetzer, a medical-statistical official of Breslau, published a 
little monograph* which throws light upon the work. He not only extracted from 
the Breslau archives all the data which were or might have been communicated 
to the Royal Society but had the Society’s archives searched, with the result 
that a letter from Neumann to Justel and another from Neumann to Halley, 
both with statistical appendices, were discovered. Thanks to the labours of 
Graetzer and an essay by R. Bóckh (Bulletin de 1 Inst. Intern. de Statistique, 7 
(1953), 1) we can form a reasonably clear idea of Halley's method, which was 
: not what those who have not examined the literature suppose it to have been. 

It is often stated that Halley, having found that during the five years of 
observation the number of births only slightly exceeded the number of burials, 
x * Graetzer, J., Edmund Halley und Caspar Neumann, 1883. 


. 916 Medical statistics from Graunt to Farr 


treated the population as stationary and’ constructed a life table by a simple 
summation of the deaths in the manner already explained. He was much wiser. 
What he tried to do was to construct a population table, in the following way. 
Suppose we know how many children were born in a calendar year, say 1690, 
in a town not subject to migration which maintains accurate registers of ages 
at death, and then we discover how many of the children born in 1690 will be 
alive on each successive first of January by a series of subtractions. We shall 
have the survivors on 1 January 1691 by subtracting those of the children born 
in 1690 who died in 1690. We shall have the survivors on 1 January 1692 by 
subtracting the deaths in 1691 occurring among the survivors to 1 January 1691, 
and so on. This will give a precise enumeration of the living population, and this 
is what Halley wanted. The figures we shall obtain will not be the conventional 
128 of a Life or (as German writers say) Mortality Table, but what in most 
modern books are represented by the capital letter L or years of life lived or 
“persons” living between the termini (see Appendix). Jf the population is 
. Stationary, the sum of these figures gives the population of the place under 
study. Now for ages between 1 (last birthday) and very advanced ages, L, is 
simply 1, diminished by $ (1,—1,,,). In the first year of life (and at advanced 
ages) the difference is greater. Thus in the first year of life deaths are not evenly 
distributed throughout the year of life, more than 70% of them occur in the 
first six months of life, so that instead of subtracting half the deaths we must 
subtract nearly three-quarters. Halley himself assigned 68% of deaths in the 
first year of life to the first half of the first year. The reason why Halley proceeded 
in this way was that he knew the population not to be stationary. His idea was 
to obtain the figures for the first few years of life accurately—indeed just as 
they are now obtained—and then to correct for excess of births over deaths. 
His masterly plan was partly defeated by the fact that his Breslau corre- 
spondent Neumann was not so good a statistician as he was. Halley’s letter to 
Neumann has not been preserved, we have only Neumann’s answer of 1 March 
1694. Probably Halley asked Neumann to send him (as a check on the caloula- 
tions he had already made) the exact numbers of survivors on 1 January for 
five years, of births in & calendar year. Neumann did send him a table, but the 
table, as Graetzer pointed out, is wrong. Neumann gave the correct figures for 
1 January of the first successive year, incorrect figures for the other years. 
Between 1 January of "the year following the year the births of which are under 
study, and the next first of January, some of those born in the starting year will 
die under and some over a year of age. Neumann merely deducted the former, 
so he has too many survivors. To reach the right figure would have meant taking 
more trouble and he did not appreciate the importance of this. Bóckh— whose 
opinion of his statistical contemporaries has a tinge of bitterness rare, of course, 
in other scientific pursuits—remarks that it was not strange Neumann should 
miss the point as it had been missed by many statisticians long after his time. 


MAJOR GREENWOOD l . 217 


It is at least clear that Halley had realized an important truth which did not 
become part of even expert knowledge for more than a century. : 

The precise arithmetical details of Halley’s work are not perhaps of much 
medical interest. Graetzer and Bóckh have done a good deal to clear it up. The 
data used (an average of five years) had 1238 births and 1174 deaths and the 
table accounts for 1238 deaths. Halley must therefore have had a plan for 
increasing deaths. It is likely, from an observation he makes on mortality in 
Christ’s Hospital, that he did not wholly depend on the Breslau figures. Graetzer 
suggests that Halley may have made two graphs, one having an ordinate of 1238 
at the origin and an ordinate of 64 at the oldest age, the other an ordinate of 
1174 at the origin and 0 at the oldest age and that he plotted the survivors for 
each graph based on recorded deaths and drew a curve passing through 1238 
and 0 between these graphs. It may be so. Using the original material which 
Graetzer published, Bóckh recalculated the table. The resulte do not, except at 
ages over 60, differ materially from Halley's. So far as concerns the-mean after 
life time (expectation of life), Halley’s table gives 27-54 years at birth, Graetzer’s 
material 27-69. For ages under 40 the re-working gives slightly lower and for 
later ages higher mortality. It may be noted that Halley’s table gives appre- 
ciably higher mortality in childhood than Graunt’s, more than 43% instead of 
36 % are dead by the age of six years. But Graunt’s method would exaggerate 
mortality (so would Halley’s method, but, owing to his precautions, not so 
greatly). On the other hand, Graunt’s estimate of age is only an intelligent 
guess. Actually, as Graetzer showed, the infant and child mortality shown by 
Halley’s table differed little from the observed rates of mortality in the city of 
Breslau in 1876-80. f ! 

It has been said that Halley was not greatly interested in the medical aspects 
of his work. After describing methods of calculating the prices of annuities, he 
has the following passage*: ; 

It may be objected that the different salubrity of places does hinder this proposal from 
being universal; nor can it be denied. But by the number that die being 1,174 per annum 
in 34,000 it does appear that about a 30th. part die yearly as Sir William Petty has ` 
computed for London; and the number that die in infancy is a good argument that the air 
is but indifferently salubrious. So that by what I can learn, there cannot perhaps be one 
better place proposed for & standard. At least 'tis desired, that in imitation hereof the 
curious in other cities would attempt something of the same nature, than which nothing 
perhaps can be more useful. í 

That the mortality of childhood depends upon the atmosphere is not so 
foolish an hypothesis as it may seem to us. Halley lived before breast-feeding 
became the exception rather than the rule. The ‘curioug’ in other cities had not 

: the wit to follow his advice." He made no other’ contribution to the science of 
vital statistics; a gain to astronomy but a heavy loss to demography. 


* I have read Halley's paper in the collection of papers, many by him, collected under 
the title Miscellanea Curiosa, printed in-London in 1705; the quotation is from p. 800. 


218 Medical statistics from Graunt to Farr 


e 


APPENDIX Ν 


Halley’s table is printed in two columns, the first headed ‘Age current’, the 
second ‘Persons’. Thus: 


Age current Persons 


C «00-100: 0t iP Ob 7 
-1 
ce 
to 


m= 


and 80 on. 
N 


A mistake sometimes made is to suppose that Halley meant by age current 
simply the end of each year of life and that the entry against each ‘age current’ 
is the number of survivors at exact age one year less than the ‘age current’, 
viz. that of 1000 born 855 survived to the first anniversary, 798 to the second 
anniversary, etc. The fact that Halley uses the round number 1000 for a first 
entry does something to encourage the mistake among readers who have not 
consulted the original paper and it is sometimes made by people who should 


` know better. It is actually a terrible ‘howler’, leading to a wholly false view of 


rates of mortality in early life. Thus if 1000 and 858 were really the first two 
entries of a Life Table as set out now, then, as the first two entries in English 
Life Tablé no. 7 Males (mortality 1901—10) are 1000 and 856, we might conclude 
that mortality in the first year of life was no lower in 1901-10 than in’ Breslau 
in the last years of the seventeenth century. But the 1000.of Halley’s table is 
not the number of new-born children but the average number out of 1238 born 
living between the ages of 0 and 1. This is what is called the L, of a modern 
table or the population living between the ages z anda+1. If we have a column 
of L,’s, which is what Halley gives us, we can deduce therefrom the more 
familiar 1,’s provided we know the starting value and the number of deaths in 
the first year of life. Halley gives both items. He says that of 1238 annual births 
348 die annually. So that his 1, is not 1000 but 1238, and his 1, is 890. He chose | 
1000 for L, by assuming that of the 348 deaths in the first year of life 238 occurred 
in the first six months of life, 68%. This differs very little from the modern 
practice; in Life Table no. 7 quoted above 73-5 % of the deaths in the first year 
of life are assigned to the first six months of life. Having been given ly and 1, 
we can deduce the other 1’s-from the values of the L's which Halley gives ` 
because, after the end of the first year of life there is little error in supposing 
that the deaths between two birthdays are evenly distributed over the year; 
so, for instance, 1, will be equal to L4 less half the difference between 1, and l, 


MAJOR GREENWOOD 219 


and proceeding in this way we put Halley’s table into modern form. I attach 
a table calculated by Béckh. 
It will be seen that, if Halley’s table is properly used, the comparison is not 
_of 1000 and 865 with 1000 and 856 but of 1000 and 719 with 1000 and.856. 
Actually this is still slightly optimistic, because I am comparing ‘persons’ 
with males. The ‘persons’ figure for 1901-10 is 1000, 869. On the other hand, 
in the Breslau data still births (or some of them) are included in births, so that 
the mortality is slightly exaggerated. If for instance 7% are still born, the 
survivors to 1 will be the same, but the 1, should be reduced to 930. Or alter- 
natively we should write 1000 and 773. 
1 attach Halley’s table reduced to modern form and with the corresponding 
‘expectations of life calculated by Bóckh (I have reworked some of the values 
from the data and agree with Béckh’s figures). 


Halley's Table, expressed in modern form together with the Expectations of Life 
αἱ quinquennsal intervals (Béckh) 











V. GUESSING THE POPULATION 


My object is to trace the growth in our country of that part of statistical 
science which is of interest to students of medicine or public health. In speaking 
of such pioneers as Graunt, Petty and Halley it was proper to construe the 
obligation rather freely. Both Graunt and Petty did clearly perceive the relevance 
of their researches to matters of public health or even clinical medicine, but much 
of what Petty did had a more direct bearing upon political questions than those 
of public health. Again, the life table is a way of expressing the facts of mortality 
which is valuable in some medical researches, but its importance as a statistical 
instrument has been much greater in non-medical than medical circles, above 
all of course in the financing of assurance business. The commercial importance 
of life tables was perceived by Halley and by other mathematicians of his and 
the following generations. 

Looking at the position after Halley’s publication it was clear that progress 
might be made (1) in improving the accuracy of the life table, viz. by obtaining 
data more relevant to the conditions of life of persons who assured their lives 
or bought annuities, (2) in simplifying the very laborious calculations which the 

Biometrika xxxit f S ` i 15 


1 


220 Medical statistics from Graun to Farr 


determination of praemia or purchase values required. Under (1) no progress . 
worth speaking of was made in England until the end of the eighteenth century. 
This was partly due, as we shall see, to a not entirely unjustified disbelief in the 
powers of thé medical profession to change the rate of mortality, partly to. 
ignorance. No first-rate English mathematician after Halley gave any critical 
attention to the theory of the Life Table before the nineteenth century. Under 
(2) considerable progress was made, but this progress is of little or no medical 
_ interest and to describe it would involve entering upon tedious arithmetical and 
algebraical detail. The primary medical-statistical quaesita are correct énumera- 
tions of deaths by sex at ages and by causes, and of the numbers living in sex 
and age groups. When these have been satisfied, the medical statistician can get 
to work. —— . x 

For 150 years after Graunt's death very little was done to improve matters. 
Down to 1801 the population as a whole had not been counted; forty years more 
passed before a reasonable age distribution was secured, and it was thirty-eight 
years after the first denominator (populations) that the first numerator (deaths) 
of the fundamental fractions was obtained. Until 1801 intelligent guessing was 
the method and the guesses of the eighteenth century deserve a few pages, if 
only because they prove that statistical ability is as rare as other kinds of ability 
and that wishful thinking is not a modern foible. 

The first estimator to mention belongs to the seventeenth century and was 
a younger contemporary of Graunt and Petty, Gregory King (1648-1712). He 
was born in Lichfield, the son of a land surveyor. At the age of fourteen he was 
recommended as a clerk to the famous herald Dugdale with whom he worked 
for several years; after Dugdale had finished his Visitation, King worked for 
various amateur antiquaries and was eventually invited by & lady of property: 
in Sandon (Staffordshire) to be her steward, auditor and secretary. Here he 

remained until 1672 when he moved to London and, no-doubt through Dugdale's 
~ recommendation, had a considerable amount of employment in both heraldic 
work and ordinary surveying. In 1677 he became a member of the College of 
Arms, in which he attained the rank of Lancaster Herald and so continued until 
his death, but worked for other official bodies on financial subjects. 

The decorous memoir by George Chalmers, from which I have extracted 
these particulars, does not give us a very: life-like picture of the man himself. 
There is a certain likeness between the early careers of Petty and King. King 
was not indeed shipped as a cabin boy, but Mr King (the elder) drank (if we may 
venture so coarse an abbreviation of Chalmers's statement that the father studied . 
and practised his profession ‘with more attention to good fellowship than 
mathematical studies generally allow’) and King junior was a pupil teacher at 
eleven. If he really read Hesiod and Homer, made Greek verses and taught 
himself to survey land in his thirteenth year he must have had Petty's precocity. 
Both Petty and King had experience of practical surveying and, of course, both 


o 


- King somewhat understated the population. 


MAJOR GREENWOOD . 221° 


were interested in political arithmetic. But there the parallel ends. King was 
& professional surveyor and archivist and had a reasonably successful professional 
‘career. Petty was— Petty. One might, perhaps, adduce as another parallel that 
King made,some enemies and thought himself ill-used.' But the job by- which 
Sir John Vanbrugh, a stranger to the College of Arms, was made a king-at-arms 
over the head of an official of twenty years’ standing would have galled the 
meekest of mankind. One may safely conclude that King had more knowledge 
of the data of political arithmetic than Petty and less originality. His vital 
statistical work was not published until nearly a century after his death, as an 
appendix to the second edition of An Estimate of the Comparative Strength of 
Great Britain by George Chalmers, London, 1803. Perhaps he never intended 
to publish it—he communicated the substance to his contemporary Davenant—. 
and this may explain why there are no details of how some of his results were 


‘reached. The report reads rather like a'document prepared for official use Py. . 


persons interested in results not methods. 

The starting-point of King's attempt to estimate the population was a return 
from the Hearth Office of the number of houses assessed to tax on Lady Day, 
1690. That was 1,319,215 which, King estimated, had increased to 1,326,000 by 
1695. He deduoted 30,000 for empty divided houses,* took the round figure of 
1,300,000 and assigned 105,000 to the London area, 195,000 to other cities and 
market towns and 1,000,000 to villages and hamlets. He used a series of multi- 
pliers, 5-4 for a house within the walls of London, 4:6 for a house within the 


- liberties, 4-4 for the out parishes in Surrey and Middlesex and 4:8 for Westminster. 


For other towns, his multiplier was 4-3 and for villages 4-0. 

"Having performed his, multiplications he gives London a bonus of 10%, 
other towns 2% and villages 1%. Lastly he estimates homeless people to 
number 80,000. The final result to the nearest round number is 54 millions. 

How King obtained his multiplier is not clear. In addition to the Hearth 
Office data he says he used ‘the assessments on marriages, births, and burials, 
parish registers and other public accounts’ and that from these he deduced the 
multipliers, but this is rather vague. He also classified the population by sex, 
civil state and age (under 1, under 5, under 10, under 16, above 16, above 21, 
above 25, above 60). How he reached these figures is not explained. 

Ὃν But nothing succeeds like success. As we shall see, his estimate of "the total 


* Prof. E. C. K. Gonner (J.R. Statist. Soc. 76 (1912-13), 261-07), in an interesting paper which 
I have largely used in writing this chapter, remarks that the ‘houses’ of the Hearth Office must 
have been really families or separate ocoupations as King indeed realized, and thinks that King 
fell into some confusion ın attempting to replace families by: houses. Gonner argued that the best 
way was to proceed on the basis that thé Hearth Office unit of a family should be retained and 
be corrected for empty houses, blacksmiths’ shops, etc. on the basis of 1801 census returns and the 
multiplier used should be persons per family of 1801. The result 18 to give a figure about a quarter 
of a million larger than King’s. The method described in the text also endi to the conclusion that 


15-2 


222 Medical statistics from Graunt to Farr' 


population is probably very near the truth and Prof. Westergaard has remarked 
that, judging from Swedish observations grt & few years later, King's age distri- 
bution is quite reasonable. 

As a statistical prophet King was no more successful than his ccnteniporatica 
(and successors). He believed that down to his time the population of England 
had doubled in 435 years, that the next doubling would require from 1200 to 
1300 years and that in A.D. 3500-3600 the population would reach 22 millions 
of souls, in case, as he cautiously adds ‘the world should last so long’. His 
estimates as a matter of arithmetical curiosity are excellently fitted by a logistic 
with an upper asymptote of fifteen millions and would give the present popula- 
tion as about eight millions. ' 

' Modern statisticians, such as Farr and Brownlee, have confirmed King's 
estimate of the population at the end of the seventeenth century in the following 
way. After 1801 the population was known by actual counting and for the first- 
forty years of the nineteenth century baptisms and burials were still the only 
data of births and deaths. Lf one started from, say, the enumeration of 1831 
and worked back to the population of 1821 by adding the numbers of burials 
and subtracting the number of baptisms then, if these really measured deaths 
and births, the result ought to agree with the census enumeration, provided 
immigration and emigration balanced. But the burials and baptisms under- . 
stated deaths and births. One might adjust the figures by multipliers to bring 
the result into agreement with the census and then test against another backward 
run of ten years. Brownlee found that if the number of burials were multiplied 
by 1:2 and the number of baptisms by 1-243, the agreement was good. 

This may seem a highly conjectural method, but it certainly gives quite good 
results. The difference between births and deaths estimated in this way for the 
decennium 1801-10, I find to be about 12-4 per 1000 living. If one multiplies 
the enumerated population of 1801 by (1:0124)10 we reach 10-1 millions, not a, 
' bad approximation to 10:2 millions actually counted. Assuming that before 

1801 burials and baptisms had the same relation to deaths arid births as between 
1801 and 1841, we can work backwards to the beginning of the eighteenth 
‘century with the result that the population then was about 5:8 millions, not 
much more than King's estimate. In view of the following discussion it will be 
useful to consider the probable state of the population (as determined by these 
methods) in the eighteenth century. In the first sixty years of the century it 
grew very slowly, was about 6-1 millions in 1761 and 6-5 millions in 1761. It 
then began to increase faster, was 7-5 in 1781, 8-2 by 1791 and 9-2 at the census 
of 1801 (8-9 as enumerated, but an estimate of a deficit of 1/30th was made). 

From Gregory King’s time to the census of 1801 we have a series of.more or 
less intelligent guesses., 

-These are well described in Prof..Gonner's paper.* Two schools of thought 


* J.R. Statist. Soc. 76 (1912-13), 261-06. 


t 


Masor GREENWOOD 223 


did battle in the eighteenth century ; the pessimists who held that the population 
was decreasing and the country going steadily to the dogs, and the optimists 
who believed just the contrary. Both used the same weapons. The heavy artillery . 
was a return of houses for taxation purposes increased conjecturally by a figure 
for houses which escaped taxation, the sum multiplied by a conjectural average 
of persons per house. As light artillery one had the yield of taxes on commodities 
and the returns of baptisms and burials. . 

A pessimist put the number of untaxed houses low and the multiplier low, 
and an optimist raised both. 

The first controversy which took place in 1754 in the proceedings of the Royal 
Society did not attract much notice. Brackenridge (mildly pessimistic) pointed 
out that the number of houses assessed to house tax had decreased between 1710 
and 1754 from 729,048 to 690,000, which suggested a decrease of population 
(by a previous conjectural calculation based on burials and baptisms, he had 
reckoned a small increase, which was probably correct). Much turned on the 
number of houses which did not p&y tax (either because the occupant was in 
receipt of alms, did not, owing to poverty, contribute to the church or poor rate, 
or through mere default). Brackenridge put the number at 200,000.. His critic, 
Forster, argued that Brackenridge under-stated the number of untaxed houses, 
adducing a sample of nine country parishes with 588 houses of which only 177 
were taxed and a market town with 229 taxed houses out of 448. Using these 
figures as a basis for conjecture Forster raises Brackenridge’s 890,000 to 1,427,110. 
From this (with a multiplier of 6 for town houses and 5 for country houses) he 
reaches a population of seven and a half millions—probably a considerable over- 
estimate. 

The next controversy was a quarter of a century later (in a period when the 
population was certainly increasing) and its originator was Dr Richard Price 
(1723-91), who has attained a posthumous celebrity reminiscent of the man 
whose title to distinction was that he had once been kicked by George IV. Most 
readers know him as the preacher of a sermon which was the text of Burke’s 
Reflections, most students of economic history know him as the inventor of that 
theory of the virtue of a Sinking Fund which has been likened to the economic 
system of & community which prospered by taking in one another's washing; 
most vital statisticians remember him as the computor of the Northampton 
Life Table which gave a seriously incorrect picture of prevailing mortality and in- 
directly cost the country a large sum of money. Finally, in the controversy about 
to be described, Price was pertinaciously in the wrong on all the main issues. 

The apparent inference from all this is that Price was either a fool or a knave. 
Gainsborough’s portrait of the Rev. Richard Price, which hangs (or did hang) 
in thé Board Room of the Equitable Assurance Society, gives no support to the 
hypothesis that Price was a fool; his life would be a promising field of research 
for a young historian with a competent knowledge of economics. His importance 


224 Medical statistics from Graunt to Farr 


in statistical history is not great enough to justify me in a critical study (even 
if I had the necessary training in finance and economics). My guess is that Price 
was an able; self-confident, original-minded man, who knew a good deal about 
many things and had no exact knowledge of anything. He had ‘a way’ with 
him, he could interest people. In fact he had some of the qualities of Petty. It is 
easy enough to make jokes about his notion of the mysterious power of money 
to increase at compound interest and it is possible that William Pitt the younger 
(who was only a boy when he adopted Price’s theory) was not a good economic 
reasoner. Still, even 150 years ago, there were bankers and Treasury officials, 
and it is possible that both they (and Price) were not so much bad theoretical 
reasoners as shrewd opportunists, that they were deliberately blind to the 
speciousness of an attractive defence of a desirable financial expedient. I have 
myself sometimes wondered whether, in the eighteenth century, an Assurance 
Society would have minded very much if a Life Table had erred on the pessimistic 
side. 

Price did not enter on the population question with an unbiased mind. He 
was a keen politician and he believed that the policy of the government was bad 
for the country; he also believed that the wealth of a country was its people. 
Hence he believed that the population was declining and nothing shook that 
belief. Had he survived another ten years, until the first census, he would 
probably have disputed the accuracy of the returns. 

Price began with the figures of houses in 1690, which he cited from Davenant 
(they were really due to King, who communicated them to Davenant), making 
the total 1,319,000. He then gave the figures of assessed, chargeable and cottages ' 
(cottages being houses too small to be taxed) as 678,915, 25,628 and 276,149, . 
making a total of 980,692 in 1761. In 1777 they were 682,077, 19,396 and 
251,261, a total of 952,734. On this basis he concluded that the population had 
declined by about one and a half millions and was actually less than five millions. 

Howlett and Wales, Price’s chief opponents, impugned every step in the 
reasoning. First, they pointed out that in the estimate for 1690 there was almost 
certainly a confusion between families and houses. Then they argued that many 
householders evaded duty. (for instance by the simple plan of blocking up 
windows (the prayer ‘Lighten our darkness, we beseech thee, Oh Pitt’ is still 
remembered) and showed by direct enumeration in certain parishes that the 
returns were inaccurate. Finally, they gave reason to think that Price’s multi- 
plier was too small. On each of these points they were probably right. Indeed 
Price was obliged to admit the validity of some of their criticisms. But he 
declined to budge; sometimes he took ad captandum advantage of arithmetical 
slips by his adversaries, sometimes he declined to admit that their samples were 
representative, sometimes he tried to ignore the effect of corrections which he 
was forced to make. ^ 


These were the principal arguments. Both parties used the data of burials 


" MAJOR GREENWOOD 225 


and baptisms as subsidiary arguments. Price seems only to have uséd the London 
Bills, which rather let him down; because although they seemed to help for some 
- part of the century, he admits that by 1773 London was increasing and, very 
characteristically, uses this as in his favour: ' But it appears that, in truth, this 
is an event more to be dreaded than desired: The more London increases, the- 
more the rest of the country must be deserted.’ Price’s adversaries wẹnt farther 
afield and counted burials and christenings in 162 parishes in all parts of England 
for two quinquennia, one beginning in 1758, the other in 1773. Baptisms increased 
from 47,638 to 59,567, burials from 49,553 to 53,030. 

But neither party put much weight upon what we should now consider 
primary evidence; rightly, because of its incompleteness. i 

But these data were not wholly neglected by medical writers as we shall see 
in later sections. One may fairly say on the evidence here summarized that the 
- eighteenth-century political arithmeticians óf England made no advance what- 
ever upon the position reached by Graunt, Petty and Bing: They were second- 
rate imitators of men of genius. 


(To be concluded? 


MOMENTS OF THE DISTRIBUTIONS OF POWERS AND 
À PRODUCTS OF NORMAL VARIATES 


By J. B. S. HALDANE, F.R.S. 


CONTENTS 
5 PAGE 
l. Introductory i . ; . 226 
2. Distribution of the cube ofa adem variate ; ; A . 228 
3. Distribution of any power of a normal variate . 229 
4. Distribution of the product of three independent normal variates 230 
5. Distribution of the product of n independent normal variates . 231 
6. Distribution of the product of two correlated normal variates . 232 
7. Distribution of thé product of three correlated normal variates . 235 
8.᾽ The product of n correlated normal variates : ; . 238 
9. The Galton-Maealister distribution . : . ; : . 299 
10. Biological applications $ . : ; . 3 . 240 
11. Summary . ; ; . : y . : 3 ; . 241 
References . 3 ! : ; . ; . 9 : . 242 


1." INTRODUCTORY 


Wen numerous organisms or organs are weighed, the distribution of the weights 
is often positively skew. On the other hand, the distributions of linear measure- 
ments is often very close to normal. The question then arises, given that the 
volume of an organ is proportional to the product of three mutually perpen- 
dicular méasurements, each being normally distributed,.what will be the dis- 
tribution of the volumes? In general the three linear measurements will be 
correlated, and the problem might appear hopelessly complex. However, it will 
be shown later that, provided certain conditions are fulfilled, the distributions 
all lie very close together. 

The problem can obviously be generalized to cover the case of the product of 
any number of normal variates. The most interesting cases are those of two, 
three, or an infinite number of such variates. Further, two special cases are com- 
paratively simple. When the coefficients of correlation are all equal to unity and 
the coefficients of variation equal we are concerned. with the distribution of a 
power of thé normal variate. When the correlations all vanish, we are concerned 
with that of a product of several uncorrelated normal variates. 

‘It is not, of course, suggested that all skew variation of weights is to be 
explained on these lines. For example the Galton-Macalister distribution, in 
which the logarithm of the variate is normally distributed, can be thought of as 
arising in at least two different ways. The weight may be the product of a large, 
number of normal variates; or for constant cell size, the number of cell generations 


J. B. S. HALDANE 227 


may be distributed in a certain manner about a mean in each organ, these numbers 
being normally distributed. The highly skew variation of human weights is 
probably to be explained by the fact that a rather small fraction of the human 
race lays down very large quantities of fat. Nevertheless, it will be shown that 
simple criteria will determine whether observed positive skewness and lepto- , 
kurtosis, too large to be ascribed to sampling error, can be explained on the lines. 
discussed above. And in any particular case it is worth while finding out whether 
this is so. 

Any measure of asymmetry, such as y, = κι, or of kurtosis, such as Ya = £,— 3, 
is a dimensionless number independent of the unit of measurement. Hence in a 
transformed normal distribution of the type here considered it must clearly be 
a function of the only dimensionless number derivable from'the first two moments, 
namely, the coefficient of variation, c. In what follows we shall generally use m 
for'the mean of the original normal distribution and km? for its variance, so that 
kt is the coefficient-of variation. The usual notation is used for the mean, variance 
and other moments and cumulants of the derived distribution. 

The distribution of the product of a pair of correlated normal variates has 
already been fully discussed by Craig (1936). If m, and m, are the mean values - 
of X and Y, k, and k, their coefficients of variation, and p their coefficients of 
correlation, then Craig finds for the cumulant generating function of 


XY 


ο. 


If k =k, = k, UAE 


0 
κ(θ) = παρᾶ BH -ρ)θ]-- $log [1+ (1— p) 6]. 
The rth cumulant of wae TEED 18 


n -dr (+p) που a(t + Ἔ- πες) 


{ακρη agg [+H 0)“ (o — 1). 


If Κι = k, = k, the rth cumulant of XY is 
K, = (mymy) [E py η) Γή +p) + (0-15) (0-01. 
Craig’s discussion is mainly confined to the cases where k, and k, are large. In 


the cases of greatest biometrical interest they are small, and some points of 
interest arise, besides those dealt with by Craig. 


998 Powers, and. products of normal variates 
9. DISTRIBUTION OF THE CUBE OF A NORMAL VARIATE 
` Tf a be a reduced normal variate, that is to say, a variate whose mean is zero 
. and standard deviation unity, it is required to find the distribution of Χϑ, where 
X= m(i 4- kiz). We note that ae 
Y gi = 0, and z* = (2r — 1) (2r — 3) (2r— 5) ... 3.1 = 
Hence es _ u 
X? = m?(1 +37 + 3kx? + ktr?) = m5(1+ 3h), 
X9 = m*(1-- 6kkz + 16k + 20123 + 10522 + bkt + χα) 
= mê(1 + 16k + 45k*+ 1519), 
_ 80 that the first four moments of X? about zero are 
JA = m}(1 + 3k), l ; 
u, = m9(1 + 15k + 45K? + 16/9), . 
μή, = m*(1 + 36k + 378k? + 1960153 + 945k4), 
μί = m "(1 + 66k + 14852? + 1386019 + 51975k4-+ 62370K5 + 1039545). 
Hence the moments about the mean m4(1 + 3k), and the cumulants, are 


(2r)! 
rrU 








= ui = πι] + 3k), 
Κας μα = : 88k (3 + 12k + 5k); 
Kg = [ly = BAM?KY(3 + 16k + 15k), ᾿ ; (1) 
Ji, = ZTM (9 + 240k + 192613 + 1920194 385), | 
κ. = 648m12h8(7-+ 48h + 7618 + 1613). 


. It is to be noted that in these calculations, and in the majority, though not 
all, those of this: paper, the expressions for the moments-about the mean are 
simpler than those for the moments about zero. Successive moments about the 
mean are therefore best calculated from the former, so qe as possible, rather 

-- than the latter. "Thus the equation 


Ha = M — μι(άμο-- Bpi Ha + 3°), l 
` involves leas algebra than the more usual ` 
Ma = Ma — 4p p + pss — us. 
It follows that 
σ = 3m? 1 + 4k +$), 
_ (9k-- 36k? + 16k? 
143k i 
| +6 (3k) (3 + 16k + 1543) 
Doo (q8412kr60k» — C 
_ + 72k(7 + 48k + 75k* + 1519) 
πο (3+ 12k + 5k)? 





J. B. S. Harpaxe — 229 


᾿ Hence the distribution is idis skew, and leptokurtic. If o is small, we 


have, approximately, 
! yi = 2c + 2, 


(3) 


ον δθοΣ A a . 
Jar 9t 
Sincein most practical cases c? « 0-01, the first term will have an error of less 
than 0-1 %, and no attention need be paid to, the later terms. , 


- 


3. DISTRIBUTION OF ANY POWER OF A NORMAL VARIATE 
It is required to find the moments of m”(1 + biz)". The mean value of its rth 
power is clearly 
pap (nr)! k fry (nr)kb - il ' 
* 2i» —2)i 2)! + B(nr — 4)! 28 . 8l(nr — 28)1 
' This converges for all bouts values of n, since k is small, though it only 
terminates for positive integral values of mr. After somewhat tedious algehns, 
we find for the mean and other moments and cumulants: 
=p = m+n- 1) b .. i; 
= μι = Mnk + $(n— 1) (3n— 5)k+...], 


= us = 3m*n3(n— 1) 1311 -- 3(17n3 — 55n + 44) k 4- ...], (3) 
Ha = Sm7m^E[1 +4$(n—1) 1) (5n—7)k+...], P 
κι = 4m4*n4(n— 1) (4n — 5) ΓΙ + 0(E)]. 


Hence c = nki[1 + $(n — 1) (n - 6) E ...], 
γι = 3(n— 1) RAL + di (n2 — 38n + 43) Et ...], 
ys = um 1) (4n — 5) k[1 + 0(&)]. 





And when c is small Hae 1) 
DEC e[1 + 0(¢*)], 
πα un 
‘Ya = 5 [1+ 0(c*)]. 
When τι = 2, the expressions become.very simple, ᾿ 
κι = m(1+k), 
= 2m^k(2 + k), | 
= 8m*k*(3 + k), (5) 


K, = (r— 1) m? (2k (r+ k). 
This is in accordance with Craig’s formula, putting k, = ερ = k, p-l. 





230 Powers and products of normal variates 


4. ` DISTRIBUTION OF THE PRODUCT OF THREE INDEPENDENT 
NORMAL VABIATRS α, 
Let z, y, z, be independent reduced normal variates, so that zi py = p" =l, 
Ti = y= 24 = 3, ete. Let X= m (l+ktg), Y= m (l+kty), -Z= m,(1+ kz). 
Here kt, ki, kł are coefficients of variation of linear measurements, so that ky, ka, ks, 
are commonly about 0-001: It is required to find the moments of the distribution 
‘of XYZ. In the expansion of a power of X Y Z we need only consider even powers 
of æ, y and z. For example, 
` X3Y3Zi = mimimi(l- biz) + kły)? (1+ ἔα) 
E = mimimi(1 + k,a?) (1+ kay?) (1+ kaz?) 
= mimimi( + ky) (1+ ka) (1 + ka). 
Bo if mmm = V (a volume), the moments of X Y Z about zero are 
Aah, | 
` Mg = VE ES) (1+ ka) 19, 
Hs = V3(1+ 3k,) (1+ 855) (1+ 35), 
Ha = V4(1 + 65, + 813) (1+ 65, + 58) (1+ 6k, + 818). 
Βο :κὲ- ΤΩ͂Ν ys (2h, + Ek, kg + ky kaks), 
Κα = lg = 6V3(Zk, k, + 4k, kko), 
μα = ΒΑΣ + 22, key + GE ES b, + 38k kg b, + 3203 ed 
+ 36212 k, ky + 182k? b, b, + Ok? KA RB, 





* 


Κω. GV [2I I, + ΣΑ + 4h, E e (4-- 42k, ΔΣ b, + ky kakg)]. 
In practice we can negleot all terms except the leading one, and write 
Ki— PV, à | 
Kg = V*(k, + ka + ka), ANS 
ky = 6V (Keg ls + kak, + ky ka), | (8) 
Kk, = VIVIR, 18 keg + e, + e, kå + ka ΣΥῸΣ 


If k, = k; = k; = k, we find 


x k= y 
Kg = V4k(3+3k+ k?), p 
Ks = 6V3K*(3 + Ak), - . (7) 


Ky = 6V4K3(28 + 51k + 24k? + 44°), 
which may be compared with equations (1). Thus 
` 6 = f(3k+ 3E8 + 19), 


9 
_ 56o 241. z 
Ya = 9 sg em » 


à 4e 
` Yı = 2 —— + ..., 


J. B. S. HALDANE. . 981 


Tt will be noticed that the leading terms of equations (2) and (8) are identical. 
Hence the two distributions will, in practice, be quite indistinguishable, even if 
tens of thousands of individuals are weighed. It follows that the cube root of the 
_ product of three independent variates is almost normally distributed. Further, 

‘the relations of γι and y, to c are little altered when k,, ka and ky are different, 
provided that they are of the same order of magnitude. Thus if k; = 0-001; 
a 
ka = 0-002, Κα = 0-003; we find γι- πο instead of 2c, and EI 
instead of 6-2c®. Only a very large sample indeed would reveal departures of this 
order from the simpler expressions of equations (2) and (8). 


5. DISTRIBUTION OF THE PRODUCT OF n INDEPENDENT NORMAL VARIATES 


Let αι, X, Ug, ..., Ep ..., £,, be n independent reduced normal variates. The 
general case is of course somewhat complicated, so we shall only investigate the 
special case where all thd coefficients of variation are equal. 

Then if X, = m,(1-- kiz,), and M = Im, the moments about zero of the 
distribution of the product /7X, are 


: = M, 
= M*(14k) 
ic M3(1+ 3k)", . 
pa = M*(1-4- 6k 4- 3k), 
ug = M5(1-4 10k-- 156435, 
με M*(1-- 18k 4- 46k? + 15/9)". 


Hence 
= M*( +k- 1, 
μα = MEL + 8h)"—3(1 +h)" 4-2], 
pa = M‘4[(1+ 6k + 3k?) — 4(1 + 3k)" + 6(1 +k)” — 8], 
= MS[ 4 10k 4- 15k) — 5(1+ 6k + 823)» + 10(1 + 3k)^ — 10(1 +b)" — 4], 
= M*[(14- 15k+ 45k* + 1543)" — 6(1 + 10k + 16k*)^ + 16(1 + 6k + 382)" 
` τ — 20(1 + 3k)" + 16(1 +)" — 5], 
k= M4[(1 + 6k + 3k2)"— 3(1 + 2k + k*)^ — 4(1 + 3h)" + 12(1+ k)” — 6], 
= M*{(1 + 10k + 1533 + 5[ — (1+ 6k + 3h?) — 2(1 + 4k + 9)” 
+ (Í+ 2k + key + 4(1 4+ 3&)^ — 12(1 + k)"] + 24}, 
= MS{(1 + 15k -+ 45k? 4- 16k)" + 15[2(1 + 3k+ 3h? + I9)" l 
— (1+ 7k + 9k? + 3k3)" + 2(1 + 6k + 3k?) + 8(1 + 4k + 5)” 
— 18(1+ 2k -+ k?) — 8(1 + 3k)” + 24(1 +k)" — 8] 
— 6(1-- 10k + 15k)? — 10(1 + 6k + 9k2)). 


k 
e 


232 Powers and products of normal variates ' 
The higher cumulants can be evaluated as follows: 


s - 9 x )e-n, 
σα” 


o 


Kk, — 3M Y, (7) (3731— 1) [(2 - kr — 4] k", eto. 


r=1 


Thus we find 
kK, = M, l . 
Ky  M*nk[1-- 3(n — 1) k ...], 
Κα M*n(n— 1) E5[3 - 4(n — 2) k t ...], 
k, = Min(n— 1) S[4(4n — 5) + 3(13n*— 49η -- 47) k+ ...], 
Kg = 5M5n(n— 1) k*[(5n — 6) (bn — 1) + 2(n — 2) (143n — 345) (k + ...], 
κε 9.M*n(n — 1) k5[432n? — 1853n? + 2917n — 1758 + 0(5)]. 
These may be compared with equations (3). 
e = Af (nk) [1 -- 1(n — 1) k t ...], 
" 3(n—1)c 
n 


4(n — 1) (4n — 5) c? 
pauco 


(9) 


and when ο is small 





1 [1 + 0(c*)], 


a [1+ 0(c*)], 
as in equations (4). 


When n = 2 we have the simple forms 


k= M, 
Ka = M*k(2+ k), : 
Kg = 6M3k?, 


K, = 6M4k5(4 +4), 
Kg, = (2r — 1) | M?rker(2r + k), 
Kgg = (2r +1)! Meth, 
This is in accordance with Craig’s formula, putting k, = k, = k, p 0. 


6. DISTRIBUTION OF THE PRODUCT OF TWO CORRELATED NORMAL VARIATES 


In general the different linear dimensions of'an organ or organism are posi- 
tively correlated. Organic correlations may reach very high values, such as 0-9, 
and presumably even higher values would be found for two approximately equal ` 
diameters of an approximate solid of revolution, such as an apple or an egg, where 
p may be taken as unity. On the other hand, quite low values are found. Thus the 


-- a 


J. B. S. HALDANE 233 


length and breadth of & homogeneous group of like-sexed adult human skulls 
generally show a correlation of about 0-3 or 0-4. Hence the area of an organ will 
` ..commonly be proportional to the product of two correlated variables, the 
volume to the product of three. 
Let z and y be two reduced normal variates, with correlation p. The cumulant- 
generating function for their joint distribution is $(+ 2ptu + u*), That is to say . 


: s == ; RT : tus. 
if r-- 5 is odd, then a*y* = 0. If r+s is even, then ay is the coefficient of ΠῚ in 
exp b(t? + 2ptu +u?). That is to say, αὐ” is the coefficient of fu? in . 


(È + 2ptu+ ελ, 
multiplied by rie! - 
AHI) 

It follows that 

=l, αψ-ρ, 

- αι — y go, — mii 14 2p, 
45— 15,  a5y — 1δρ αγ = 3(1+4p%), P = 3p(34-2p?), 
z= 106, zy — 105p, xy? = 15(1+ 64), zy? = 15p(3 + 403), 
aly’ = 3(3 + 24p* + 8p*), 
xy? = 105(1+ 803), ση = 315p(1+2p%), —-x®y# = 45(1 + 129% + 8p*), 


ay’ = 315(1 + 18ρ3 1 16p). 
Of course xy" = a*y*. 
Thus the moments of zy about its mean p are 
My = lop, Mg = 2(3+p*), μι = 3(3+ 14p* + 9ρ3). 


"Hence Κα = 6(1+ 6p? + pt). Hence the distribution-is leptokurtic, and asym- 
metrical unless p vanishes. ο 
Now consider two correlated normal variates 


X=m(l+ar), and Y=m,(1+by), 


. where a and b are coefficients of variation. Let πη πια = A. Then the moments.of 
~ XY about zero are 


My = A(1+pab), 
μὲ = AL +a? + 4pab +b? + (1+ 2p?) a3b2], : 
u = AX[1 + 3(a? + 8pab + b?) + 9ab{pa? + (1 + 2p*) ab + pb?) + 3p(3 + 2p*) a?b?], - 


κά = AS(1 + 2(3a? + 8pab + 903) + 3pa* +,16pa%b + 12(1 + 2p?) a*b* + 16pab? + 3pb* 
+ 6a%{3(1 + 4p?) a? + 8ρ(5 + 903) ab + 3(1 + 4p?) δ) + 3(3 + 24p? + 8ρ4) a*b*]. 


t 


234 Powers and products of normal variates 
Hence » 
= Aa? + ων +5%+ (1+ p?) a3b?], 
= fly = 2APab[ Spa? + (1 + p°) ab + pb?) + p(3 + p°) ab], 
* fy = SA* (a2+ span b2)? + 2a*b*((3 + 7p?) a? + 2p(7 + δρ") ab 
+ (3+ 7p) b?) + (3 + 14? + 3p*) ab^], 
K- = 6A%a%2[2{(1 + 8p*) a? + 2ρ(5 + p3) ab + (1 + 3p?) b3] 
- + (1+ 6p? + p*) a?b*]; (10) 
in accordance with Craig's formula. y - 
The most interesting case arises when a = b = ki. This case is important 
because in practice the coefficients of variation of different linear dimensions of 
the same organ are often nearly equal. Thus those of linear skull measurements 


in like-sexed adults in a racially hothogeneous population are about 0-03; so that 
E is about 0-001. The moments and cumulants of X Y are then 


K = A(1 + pk), ᾽ 
Kg = μα = ASE[2(1 +p) + (1 1-3} E], 
Ks = fg = 2A*K*[3(1 +p)? + (3 +p") k], 

. Ha = 9413µ14(1 + p)8 + 4(3 + Τρ + 7p? + 303) k + (3+ 140? + 3p3) 13], 


κ. — GASES[4(1 + p + (1+ 6p? + p$) k]. Ἢ QU a9 

- Hence - 1—4p— 3p í 
e= aoi 14 EE e. al | 

while ^ y, = Fell + 0(0%], 


> Ya= SO[L + 0(09)] 
exactly as found when the two variates are in a constant ratio or quite indepen- 
dent. When however a+b, this is no longer the case, for itis clear that the distribu- 


a? 4- b? 
dab = p, so that p> 1, we have, 





tion becomes okma when a or b vanishes. If 
for equations (10), à 
. "e 1+ 2pp + p? 3c _148pp— Spt + pp? 303, 
η | το 3’ 8 σερ} 
both approximately. However, a and b may differ considerably without any very 
great effect of y,/c or γα/ο. Thus if a = 2b, so that p = $, 
_ 8(2+ 6p + 2p?) 8ο 
to (6t4pP 3 
That is to say, γι is θά % of 3c/2 if p = 0, and 86 % if p = 4, whilst y, is 51 % of 
3c if p = 0, and 71 95 if p = 4. 
So far we have assumed that p is not negative. If p = —1, XY = 1—Ex?, 
so the mean is (1 — k), and the other cumulants are given by 


K, = (—)--273(r— 1)! Ark. 


J. B. S. HALDANE : ` 235 


k um : : , . - 
Ifp=-1+2 fst g tate Xs in equations (11) vanishes, though the distribution 


does not become quite symmetrical. However, negative values of p are of no 
biological interest. i i 


7. DISTRIBUTION OF THE PRODUCT OF THREE CORRELATED NORMAL VARIATES 


Let 2, y, z, be three reduced normal variates as before, their correlations being 
Dy; = Ks Pee = À, Poy = H. Thus the cumulant-generating function is 


5 (5 + a? + 0$ + 2icuv + 2Avt + Lutu). 
Odd moments vanish, and the even moment x?y%z" is the coefficient of Pun in 
(+u? +o? + Quo + Avt + 2p jon, 





ore piqir! 
multiplied by giao ptgrrl! 


The required moments of products of two variates, such as 
ay — p, wy? = 3(11-4μϑ), 
have already been given. The required moments of products of all three are: 
yz = K+ 2Au, 
aye --3(κ-ἁλμ), Ais = (A+ Dxpoe IKu), Tz = 14 22a + BRAM, 
πόνο = 3(8x + 12Au+ 12xy?+ Βλμϑ), 
πόλο = 8(1 + 2x24 4A + 4u? + 16kAq + 8λϑμϑ), 
myz? = 3{8μ + OKA + 23 + Ox*p, + 6A3 + 12KAp?), 
atytsl = 3(3k-- 12x24 1222 + 24u? + 96κλμ + 48K%u? + 48A? u? + But + Θ4κλμϑ), 
ayia’ = 9(3k + 12λμ + 2x3 + 19kA3 + L2Ku? + 24κ1λμ + BA H 
. + 823p + BAÈ + 24κλ1μὲ), 
iyi = 9[3 + 245K? + 82k + 9002342 + θ4κλμ(ϑ + 2X + ϑκλμ)]. 
Other moments (except those such as xy®z = A+ 2x, which are derivable by 


transposition) are not needed for our purposes. It follows that yz has a sym- 
metrical and leptokurtic distribution, with mean zero, and 


= fly = 14+ 22K? + 8κλμ, 
Kg = 12[2 + 17 EX? + bXK* + 7022393 + 4κλμ(δδ + 228r? + ϑ2κλμλ]. 

If X =m,(l+ax), Y= m,(1+by), Z=m,(1+cz) the expressions for the 
higher moments are very complicated, though it can easily be shown that the 
mean is m,m,m,(1+x«be+Aca+pab), and the variance : 

m3 πιρπιξ[ Xa? + 20Kbe + Σ{1 + κἲ) b?c3 + 2X (2k + 8λμ) αξθο 
: + (1+ 22k? + θκλμ) ab, 


Biometrika XXXII i ` l 16 


N 


256 Powers and products of normal variates 


We shall only give further consideration to the case where all the coefficients 
of variation are equal. And we shall confine ourselves to two special cases of this. 
In the one x = A = y = p, that is to say the variates are equally, and therefore of 
course positively, correlated. This is not very far from the case with the human 
skull. In the other, κ = 1, whilst A = µ = p. This is appropriate to an approximate 
solid of revolution, such as many eggs and fruits, or to a regular prism such as 
some sponge spicules, 

In the first case we have for the moments of products of three variables: 


xyz = p(1+ 2p), 
σίγα = 3p(1+4p), yèz = 3p(1+2p+2p%), αληξοξ = 1+ 6p?-+ 8p?, 
zíy*z = 30(3 + 120+ 120% + 8p), xyz? = 8(1 + 10ρ3 + 169? + 8ρ4), 
xyz = 3p(3 + 6p + 14p? + 1903), 
xyi? = 3(3 + 489% + 96p3 + 1040* + 6405), 
aty*3 = 9p(3 + 120 + 2602+ 4009 + 24p4), 
πο = 27(1 + 249% + 640? + 104p4 + 128p" + 6498), 
So if X — m,(1--kiz), Y 2 m,(1- khy), Z= m4 -- kiz), and Mı Mam, = V, 


the volume which is the product of the means, then the moments of X YZ about 
Zero are 


μα = V*[1+3(1+ 40) k+ 3(1 + 4p 10ρ3) k + (1 + 6p* + 8p?) k?], 
μὰ = V3[1+ 9(1 + 3p) k+ 27 (1 + ὄρ + 8p?) E? + 9(3 + 210 + 640? 
+ 6203) k? + 27p(3 + 6p + 14? + 198) k$], 
μα = V*[1 + 6(3 + 80) k+ 9(13 + 64p + 88p?) k? + 36(9 + 640 + 160p? 
+ 1520?) I? + 27(13 + 128p + 448p? + 768p? + 568p4) k* 
+ 54(3 + 24p + 144p? + 3040? + 4240 + 25605) kë 
, +27(1 + 24p? + 6405 + 10494 + 12805 + 6409) 18], 





Hence 
= m = V(1+ 3pk), 
= fy = V?k[3(1-- 2p) + 3(1 +49 + 7p?) k + (1+ 6p? + 8p) 81, 
Κα = Ha = 6V9E[3(1 + 2p)? + (41 4- 27p + 60p? + 53p3) k - 
+ 3p(4+ 9p + 8p? + 14ρ3) k?], 
Jia = 3VAIEP[0(1 + 2p)? + 2(37 + 2220 + 4719? + 8509) k + 3(39 + 316p 
+ 1038p? + 1584p + 10014) k? + 8(3 + 24p + 127p? + 268p? 
+ 346p* + 19205) I3 -- 9(1 + 24p3+ 64p? + 104p4 + 12805 + 6409) [4], 
Ka = 6V*E?(28(1 + 2p) + 3(17 + 144p + 468p? + 688p? + 41101) k 
+ 12(2+17p + 92p2+ 193p? + 2410 + 130p8) k? 
+ 2(2 + 51p? + 140p? + 22501 + 26405 + 12809) k3]. (12) 


J. B. S. HALDANE , . 287 


These expressions reduce to equations (1) if p = 1, and (7) if p = 0. When k 
is small, οἳ = 3(1 +2p) k, y = 2c[1 1- 0(οϑ)], 


56c® " i 
Ya =p DHO]. ο. 


Thus the relation of y, and y, to c is almost independent of the coefficient of 
correlation. us : : 
We next consider the case when z = y, so that κ = 1, whilst A = g p. This. 
.is most simply solved by finding the moments of (1+k#z)(1+éty)*. Thus if 
X = m,(1-4 kiz), and Y= m,(1+ ky), the moments of X Y? about zero are 
αἱ = VL+ (1+ 2p)k), 
My = VEL + (7+ 8p) k+ 3(3+ Βρ 4- 4p?) * + 3(1+ 4p?) 5], , 
Hs = V1 +18(1+p)k+18(5 + 11+ 5p?) k?+30(5 + 15p + 18p? 4- 4p?) k? 
+45(1 + 6p + 60? + 80?) kt], 
Ha Τ ΤΠ + 2(17 + 16p) k+ 3(127 + 256p + 112p?) E? + 84(21 + 640 + 640? 
+ 169?) E? + 105(31 + 128p + 1920 + 128p + 16p4) μά 
+ 630(3 + 16p + 320 + 320? + 1603) k5-+ 315(1 + 160 + 16ρ8) k6]. 
Hence 
K; = # = V[1 + (1+ 2p) K], 
Ky = pa = V3E[S-- 4p +4(2+ 5p + 2p?) k-+ 3(1 + ἀρϑ) βῆ, 
Ky = fg = 2V9K?[3(8 + 14p + bp?) + 2(29 + 84p + 87p?+ 160?) k 
+ 9(2+ 14p + 13p? 4- 16p%) k?], 
Ha = 3V4K°[(5 + 4p)*-+ 8(40 + 112p + 96p? + 223) k+ 2(427 ~ 
+ 1628p + 2376p? + 13760? + 160p? + 160p4) k? : 
+ 24(24 + 121p + 237p? + 23405 + 104ρ4) k3 + 105(1 + 16p? -- 16ρ3) k$], 
K, = 24V 4ῇ3Γ90 + 80p + 65p? + 140? + (95 + 3640 + 513p? + 2020? 
+ 82p4) k + 3(22 + 116p + 227p2+4 2140? + 960) k? 
+ 3(4+ 670? + 64) [δ]. - (13) 


Hence, approximately, 
c? = (5+ 4p) b, 








6(2-- p) (4+ ὄρ) ο 
to (4p 7 
_ 24(30 + 80p + 65p? + 140?) c? 
a (5+ ἀρ i 


Hence y, varies between ἆδο, or 1-92c when p = 0, and 2c when p = 1. This is 
its maximum, so for high values of p, 7,/c is nearly constant. It vanishes when 
p =—0-8. Y, increases from 444c*, or 5-76c?, when p = 0, to 58c?, or 6-2c? when. 

: 16-2 


238 Powers and produci of normal variates 


= 1. This agåin is its maximum, so y,/c* is nearly constant for large P- Ya dies 
τ vanish for any admissible values of p. In fact for positive values of p it would 
be impossible except with enormous samples, to distinguish this distribution 
from that of the cube of a normal variate. If p = 1, equations (13) reduce to 
equations (1). If p = 0, they give the cumulants of XY?, where X and Y are, 
uncorrelated normal variates with the same coefficient of variation, namely, 

= V(1 +k), Ka = V?k(5 + 8k + 843), 
Ky = 4VIk3(12+ 29k + 0k), κι = 24V413(30 + 95k + 66K? + 1218), 

which may easily be obtained independently. 


΄ 


8. TH PRODUCT OF n CORRELATED NORMAL VARIATHS 


Ne the variates are X}, Xy, ..., X, ..., Xp where X, — m,(1+a,2,), and 

= Ifm, while x, is a reduced normal variate; and p,, is the coefficient of corre- 
sis of z, and Ty, and hence of X,and X,, the general expression for the moments 
of X,, X;, ..., X, is complicated. It can however easily be seen that the mean is: 


: Pn za Zp,,0,0, + 2(PrePtu +PrPsu + Deui) 0,05 0,0, t.. b 
while ‘fly ΡΙ Σαξ -- 220, a.a, 4- ...]. : 


/ 


If every a, = kt, and every p;, = p, then zPa?1z2* ... vanishes when m = Xa, 
is odd, and when m is even it is the coefficient of #7... in the expansion of 
(22+ 2pXt, t)", multiplied by a,!a,!a,1/(27m!). Thus the moments about, 
Zero are: 

μι cae 1) pk + 4n(n~ 1) (n — 2) (n — 3) pk" + ως ος sj 


2ri(n— Orit 
= P{1+n{1+2(n—1) p}k+4n(n—1) 
x (1 4(n— 2) p+ 2(2n?— 6n+5) p*}k?+...), 
μα = Pah eee ee lo) k + $n(n — 1) 


ᾷ ` x {1+ (3n— 4) p+4(9n? — 21n+ 14) p?) K8 + ...], 
ο 
= P*nk[1- (n— 1) p $(n — 1) {14 4(n — 2) 9+ (3n? — 9n 7) p?) E 4- ...], 
Jis = 8P*n(n — 1) [1-4 (n— 1) p3l[1 + 0(K)]. (14) 
Thus c = [n{1+(n—1)p} k}*[1 + 0(&)], 
n= πος . (18) 
= =D an + ofery, | 


and presumably y, etc. are approximately the same functions of c as in the case 
* of equations (9). 


J. B. S. HAULDANE 239 


9. THE GaLton-MAOALISTER DISTRIBUTION 


Consider the distribution of the nth power of a normal variate when n is very 
large, and k very small, but nkt remains constant. All but a few terms of equa- 
tions (3) will vanish, and if nkt is small, the distribution becomes identical with 
the distribution described by MacAlister (1879) whose first four moments have 
been given by Pearson (1905). Its cumulants can readily be found as follows: 
Given that x = log X is normally distributed, to find the distribution of X. Let: 
m and 8 be the mean and standard deviation of α. Then X = e. Hence, since the 
moment-generating function of x is M(t) = επ, the rth moment of X about 


zero is 
DP -- Lori .. = M(r) = emh, 


where μι, Mas eto. are the moments of x about zero. Let M = emis, | = e. Then 
γι M'l*-», Hence the cumulants of X are . 
Kk, — M, 
= M*(1I— 1), 
= M-12) - - 
= M*(L— 1) (P+ 31+ 61+ 6), 
= M*(L— 1)* (189 + 415 + 1014 + 201 + 801 + 361+ 24), 
= M*(I— 1)5 (29+ 51° + 1518 + 3517 + 7018 + 12015 + 180]4 
+ 24018 + 27012-+ 2401+ 120), (16) 
or, if 1— 1 is a small quantity, q, approximating to 8, = i 
kK, — M, 
= M, 
= M*gN (3 q), 
ky = M4q9(16 + 1503-6433 43, l 
= M'q1(125 + 222g + 205g? + 120g? + 45¢4 + 1098 -- φῦ), 


κας 3660q + ὅ700ᾳ3 + 51659? + 4946g* + 299795 — 
+ 136598 + 45597 + 10543 + 15g +g"). 


Hence 05 --ϕ, 
yi = 3c 6, 
— 166 4 
ys = 16c?+ 15c* 4- ..., (11) 


-Ya = 19δο + 22265 .. 
y, = 199601: 3660c8-+ 
' The first terms represent the limiting values of ocean (3) and (9). The 


leading term of y, up to γι at least, is (r-- 9). Thus for a given coefficient of 
variation, y, is 50 90 greater than in the case of the cubed normal variate, y; is 


167 % greater. 


940 —— Powers ànd products of normal variates 


10. BIOLOGICAL APPLICATIONS 


Wilson and Hilferty (1931) showed that the cube root of χ is almost normally 
distributed when n exceeds 2. Haldane (1938) gave the value of the cumulants 


in this case, and showed that, for large values of n, (13x? — n)" is even more nearly 
normally distributed. It is interesting to compare the cumulants of the cübed- 
normal distribution [equations (2)] with those of the x? distribution. The latter 


2 
are κι = 1, Kg = 21, Kg = δη, κι = 48n, ete., so that c A γι = 2e, Ya = 6c*, 


Thus if we compare the X? distribution with a cubed-normal distribution of the 
same coefficient of variation, we find that they are equally asymmetrical, but 
that the X? distribution has a y, which is &Zth that of the cubed normal. Hence 
the same transformation will nearly abolish both x, and κι for both distributions. 

The success of Wilson and Hilferty's transformation suggests strongly that 
we may use equations (1), (5) or (3) for the approximate normalization óf moder- 
ately skew variates. This is an urgent problem in several applications of statistics 
to biology (Haldane, 1939). We evaluate a number whose mean and standard 
deviation in the case of random sampling are known. We desire to know whether 
it differs significantly from the mean. But the sampling distribution is found to 
be skew. If we can approximately normalize it, our tests of significance become 
far sharper. This problem is taken up in detail elsewhere. If y} >ya and both 
are small, so that k? can be neglected, we can find m, n and k so that (m + kiz)” 
has a given y, and y,. By equations (3) and (4), 


n= se M k= ze m?” 


Kg 
+ T6y3—9y,’ 9(n— 1i oni 
If, however, γι is not small, it is necessary to take several terms of equations (3). 

If in an observed distribution of weights or volumes, the estimate of y,/c, 
, Or of k1Ks[k$ is approximately 2, it will be reasonable to try whether y,/c? or 
Kik,|k$ approximates to δρ, and if so to try to fit a normal distribution to the 
cube roots of the variate. It will be seen that this does not imply that all the 
objects considered are of the same shape. On the contrary, such & distribution 
is to be expected whenever three mutually perpendicular measurements are 
normally distributed, provided that their coefficients of variation and correlation - 
&re not very different. And even the greatest differences in the latter, provided 
they are not negative, will have little effect. 

. Unfortunately, reliable estimates of y, and y, can n be obtained from 
samples of the order of 1000 or more. Rendel (in a paper to be published shortly) 
‘obtained the following estimates for the cumulants of the weight distribution of 
1202 viable duck eggs, corrected for grouping. The unit is a gram: 


k, = 79:204, hy = 42.09, ky = 4118-068, hg = +937-21. 


J. B. S. HALDANE 241 


Hence the estimates of c, γι and y, are . 
- ο-Ξ 00887 40-0018, σι — +0-43240-071, gy = +0-5244 0-142. 


The standard errors are those appropriate to a normal distribution, but the true 
values cannot be very different. It is clear that σι and g, are significantly greater 
than the values of 0-27 and 0-13 which we should expect were the logarithms 
of weights normally distributed. They differ still more from what would be ex- 
pected were the cube roots of weights normally distributed, or on any other. 
hypothesis leading to a similar distribution. 

Pearl (1906) lists the moments of the distributions of brain weights of eight 
European populations of 197 to 529 individuals. The various estimates of c are 
all close to 0-080, ranging from 0-074 to 0-083. Those of γι range from +0-11 to 
0:40, those of γε from — 0-30 to + 1-6. The weighted means are 


c = 0:07965 + 0-00106, σι = +0:2306+0-0461, g, = +0-2661 + 0-0922. 


If the cube roots are normally distributed, we should expect σι = +0°16,. 
θα = + 0-037; if their logarithms are normally distributed, we should expect 
gı = + 0-24, ga = +0-10. The latter distribution gives the better fit, but the first 
is not impossible. l 

Sinnott (1937) gives a graph of the distribution of the weights of squash fruits 
in an F,, which is positively skew. He shows that a graph of the distribution of 
their logarithms, though negatively skew, is more symmetrical. There is a sug- ' 
gestion that a graph of the distribution of their cube roots would be even more so. 
Unfortunately the actual figures are not given, and since curve-fitting by eye is 

- notoriously uncertain, no more can be said. Itis much to be desired that, when 
the full data are not given, estimates of the first four moments or cumulants 
should be published. 

z | ll. SUMMARY ' 

The first four moments of a number of powers and products of normal variates 
are calculated, with special reference to the probable distributions of weights, 
volumes, or areas of organs and organisms. In each case the first two measures 
(y, and y, or βι and f,) of deviation from normality are obtained in terms of the 
coefficients of variation. The expressions obtained are almost independent of the 
"correlation between the linear measurements, provided the coefficient of varis- 
tion of the latter are approximately equal. The distribution found is perhaps 
applicable to data on brain weights, but not to data on ducks’ eggs. f 


242 . Powers and products of normal variates 


s REFERENCES 


σπατα, C. C. (1936). On the frequency function of zy. Ann. Math. Statist. 7, 1. 
HALDANE, J. B. S. (1938). The approximate normalization of a class of frequency distri- 
butions. Biometrika, 29, 392—404. 
— — (1939). New data on partial sex-linkage m man. Proc. Int. Genet. Congr. p. 197. 
ΜΑΟΑΙΙΡΤΕΗ, D. (1879). The law of the geometric mean. Proc. Roy. Soc. 24, 367. 
Ῥπαπι, R. (1905). Biometrical studies on man. I. Variation and correlation in brain- 
weight. Biometrika, 4, 13-104. 

PrAnsoN, K. (1905). Das Fehlergesetz und seine Verallgemeinerungen durch Fechner und 
Pearson. A rejoinder. Brometrika, 4, 169—212. 

Suynorr, E..W. (1937). The relation of gene to character in quantitative inheritance. 
Proc. Nat. Acad. Sci., Wash., 23, 224-7. 

Wirsow, E. B: & HirnrgRTYy, M. M. (1931). The distribution of Chi-square. Proc. Nat. 
Acad. δον., Wash., 17, 684. 


i 


Y 


Paid 


THE TRANSFORMATION OF DATA EROM ENTOMOLOGICAL 
FIELD EXPERIMENTS SO-THAT THE ANALYSIS 
‘OF VARIANCE BECOMES APPLICABLET 


By GEOFFREY BEALL 
Dominion Entomological Laboratory, Chatham, Ontario, Canada 


1. InrrRopvctToryY 


Tux present paper deals with experiments on the control of insects in the field. 

. Insuchexperimental work the problem to beinvestigated is whether more insects 

survive on plots which have been subjected to one treatment than on plots 

subjected to another. It will be shown in the present paper that the numbers of 
insects found per plot must vary in such a way that one cannot, strictly, subject 

the results to the analysis of variance, and it is proposed to find how the data 

may be transformed so that analysis of variande becomes applicable. Such 
transformation has been discussed by Bartlett (1936a@,b) in connexion with 

entomological experiments, and by Tippett (1934) in connexion with industrial 
experiments. 


2. EXPERIMENTAL RESULTS CONSIDERED 

The data used in the following work are results from seven inseoticidal 
experiments arranged by the author at Chatham, Ontario. The work was carried 
out with replicated blocks containing plots subjected to treatments of which the 
assignment was random. This procedure, normal in agronomic work, was supple- 
mented by one repetition of each treatment within a block. The assignment of 
the repetition of a treatment was independent of the first for that treatment, 
except that, of course, the same plot could not be chosen twice. This repetition 
was carried out to obtain estimates of variability within blocks. In these experi- 
ments complete counts were not made but random sampling was employed. 
Experiments on Pyrausta nubilalis Hubn., reported by Beall et al. (1939), for 
which results are shown in Tables 1 and 2, were made on one area at two different 
periods, whereas experiments on Leptinotarsa decemlineata Say, for which results 
are indicated in Tables 3 and 4, were carried out on-contiguous areas at the same 
time. Three similar experiments were carried out in one place on thé tobacco 
hornworm, Phlegethontius quinquemaculata Haw., for which the data are shown 
in Tables 5-7. Reference is also made to the data from a uniformity trial on 
insects of Beall (1939). ! 


t Publication No. 2101, Division of Entomology, Science Service, Department of Agriculture, 
Ottawa, Canada. : 


A E 


244 Transformation of data from entomological field experiments 


+ 


` 


Table 1. Numbers of an insect, Pyrausta nubilalis, per plot. Experiment I 





Block 






























AT OO ag PR woo Wh RH 









































1 25 22 | 80 
1 19 18 98 

9 21 απ. 19 

2 19 19 10 

8 48 98 27 

* 3 18 21 18 
4 13 22 17 

4 11 34 | 7 

5 23 20 84 

' b 90 15 34 

6 15 18 16 

6 16 27 23 

7 44 56 45 

7 52 89 58 


GEOFFREY BEALL : 
Η by 


Table 3. Numbers of an insect, Leptinotarsa decemlinedta, 


` 


per plot. Experiment III - 





Treatment 


Blook 

















δοδο ο ο DOD ma 
` 
S 

















Table 4. Numbers of an insect, Leptinotarsa decemlineata, 





BR WW DN ee 


per plot. Experiment IV 




















—3- 


Teble δ. Numbers 


of an insect, Phlegethontius quinqueniaculata, : 




















‘per plot. -Experiment Vo? 
2 Block 
Treatment z — = 3 
; 1 | 2 3 4 5 6 
. ` ‘| 
Lu M 6 'b 6. 3 | ο 1 
1 . 4 15 13 6 10 15 
2 0 1 1 0 |; 1” 1 
2 2 2 1 4 1 1 
8 15 17 22 28 | 8 16 
‘ 3 - 12 22 l6 |. H fp 13 25 
I . ` 
-+ ^ as E 
Dd 4 πι à τ >: , 
» 4 . ii 
= et E ` rem GE d 4 








'246 Transformation of data from. entomological field experiments 


Table 6. Numbers of an insect, Phlegethontius quinquemaculata, 
per plot. Experiment VI ΄ 
























f 
“Block 
: Treatment y 
1 8 4 5 8 
E 

1 18 9 4 11 4 

1 9 5 7 5 10 

2 6 8 1 5 7 

2 9 9 4 12 7 

3 9 5 4 8 9 
$5 9 |, 4 7 3 2 

4 1 0 1 4 3 
4 2 2 1 4 5 

5 1 12 3 6 1 
5 5 4 1 9 8 

6 . 6 8 9 6 5 

6 10 2 4 4 12 




















Table 7. Numbers of an insect, Phlegethontius quinquemaculata, 
per plot. Experiment VII 


Treatment 





1 
1 
2 
2 
3 
3 
4 
4 
6 
5 
6 
6 

















3. THE RELATIONSHIP BETWEEN THE STANDARD DEVIATION AND THE MEAN 
IN THE EXPERIMENTAL DATA 


If x is the number of insects on one of a group of small contiguous areas, say 
plots, within a larger area, say a block, let the expectation of x over all these 
plots be M and the standard deviation be σ; then over & number of the larger 


^ 


GEOFFREY BEALL . 247 


81685, when the insects are distributed in & completely random fashion, from the 
Poisson distribution, _ o= M. (1) 


As is discussed by *Student' (1919) one cannot, however, anticipste that (1) will . 
be satisfied when organisms occur in groups, as, say, when insects come from 
masses of eggs, or when there is a change in expectation from plot to plot within 
a block. Generally, σ᾽ will tend to be greater than M and we can only say 


. =f). . (2) 
The form of f(M), in (2), must be considered carefully, since it bears on the form 
of the transformation which may be developed to make the standard deviation 
independent of the mean. 

In dealing with (2), Bartlett (1936a) started by supposing that, approximately, 


= KM, (3) 


where XK is a constant. Generally, in field data, however, the relationship between 
σὲ and M, or of their respective estimates, s? and z, does not, as in Fig. 1, appear 
to be linear; rather, the departure of 65 from z becomes disproportionately great 
as Z increases. This relationship between departures and the magnitude of the 
mean has been discussed by Clapham (1936) in connexion with data on the 
distribution of organisms differing from insects as much as flowering plants, and 
he showed that only those distributions with very low mean have the squared 
standard deviation close to the mean. 

Our discussion above on the shortcomings of (3) suggests the conclusion that 


«Μαν ` 4) 


is generally untrue. We propose to consider the possibility. that the curvilinearity 
of (2) might be better met by supposing that 


—~MccM*, ΄ (5) 

Equation (5) leads to o? = M+kM?,’ ' l (6) 
where k is a constant. It will be noticed that 

l k = (c?— M) M- (7) 


is the Charlier coefficient of disturbance from a Poisson distribution. This 
coefficient was employed by Beall (1935). 

It is possible to consider the suitability of (3), as compared with (6), by finding 
how, respectively, they fit observations on s* and z. To fit exactly is difficult, and 
it was found necessary to fall back on an empirical determination of K and of k; 
, thus, if there are a number of pairs of estimates, Z and οἳ, from (3) and (6) we 
estimate K = Σ41/Σ5, (8) 

k = (2s? — zs, (9) 


' where Σ represents the summation over all pairs. 


P4 


.248 Transformation of data from entomological field experiments 


Since in the work presented in $2, z and s*, being based on only two obser- 
vations, are highly variable, these experiments do not show clearly the suitability 
of (3) and (6). Accordingly, reference is made instead to the data from the uni- 
formity trial on Leptinotarsa decemlineata Say of Beall (1939). When the mean 
and standard deviation of 144 sampling units within each of 16 areas were con- 
sidered, the estimates from (8) and (9) were K = 2-405 and k = 0-2548. For 
these values from (1), (3) and (6), curves, described as lines 1, 2 and 3 respectively, 
are plotted in Fig. 1, the observed values of mean and squared standard deviation 


10 





Fig. 1. The squared standard deviation plotted against the mean for 144 small areas within each 
of 16 large areas; line 1 is from equation (1), line 2 from (3) and line 3 from (6). The counts had 
been made on οι decemlineata Say. ' 


are also shown. In the cases where the mean is near unity the departure of the 
squared standard deviation from the mean, i.e. from line 1, appears to be trivial, 
but as the mean increases the departure becomes more marked. It can be seen 
that the observations lie more snugly about line 3 from (6) than about line 2 
from (3). Generally, for the data from field studies the same éffect has been 
observed. Such results suggest that (6) may be generally a better approximation 
to the form of f(M) than (3) and make it preferable to proceed with the λε 
of dáta from the assumption (6). 


4 


— 


4. THE TRANSFORMATIONS OF FIELD DATA 


Fig. 1 shows clearly how, within an-area, the variability of the numbers of 
insects on sub-areas is related to the mean number of insects per sub-area. This 
relationship will make invalid the use of the analysis of variance on experimental 


΄ 


GEOFFREY BEALL 249 


results involving counts on insects, since the expectation of the varianoe 
'should be the same for all plots. To overcome this invalidity, Bartlett (1936) 
suggested transforming the observations, z, from the basis of (3). The transforma- 
. tion found was- zi, which Bartlett modified to (z--1). From $3 it was seen, 
however, that for field data the relationship between standard deviation and 
mean may be represented better by equation (6) than by (3), and, since the form 
of the transformation depends on the form of f( M); a fresh transformation must ` 
be sought. A transformation, as is developed in the Appendix to the present papat, 
is suggested by the method of Tippett (1934), i.e. 


a! = k sinh- (kx): l (10) 


An advance note of this transformation was published by Beall (1940). The 
adequacy of this transformation must be judged from the extent to which it 
stabilizes variability. In (10), if we express sinh? (kx)!, when kx <1, as a well- 
known series, we have : 
α΄ = gi — i ket t p ech — τες εδω + ..., (11) 


where it is obvious that for k = 0, x’ = at. Of course, for large values of kz, 
x’ varies almost as log x, or as the log (x +1) used by Williams (1937), and so our 
proposed expansion may be regarded, for practical purposes, as embracing the 
root and logarithmic transformations. - 

Table 8 gives the transformation, (10), for a probable range of observations, 
x, and for k at intervals which will probably be close enough for practical purposes. 
This table was computed in part by inverse interpolation from the table of hyper- 
bolic functions of the Smithsonian Mathematical Tables (Becker & Van Orstrand, 
1931), and in part from (12). Should values of x’ be required outside those of 
Table 8, these can conveniently be calculated from 


z' = klog, {(ka) + (1+ kx). 2) 


In preparing Table 8 the question arose of whether, instead of dealing with 
k-isinh-!(kz), one should not use b-sinh(Ei(z--3)!*) in the same way as 
Bartlett (1936a) dealt with the transformation, (x+ 4), instead of αἱ, This 
modification was rejeoted.on the basis of results of the transformation, as dis- 
cussed in $ 5, since it was found that the addition of 4 made little difference and 
did not give, consistently, an improvement. 

For field data, in making the transformation (10), it is necessary to estimate 
the value of k empirically by (9) for which estimates Z, of the mean and 8, of the 
standard deviation, must befound. The most obvious method in practice of making 
these estimates seems to be to put more than one plot subjected to a given treat- 
ment in a block and so to estimate the chance variation of results for a plot within 
a block. In the present work, as is discussed in $2, two plots were subjected to a 
given treatment in each block and this is probably good practice. 


k 1 


a! = 


Table 8. The transformati 


` . 250 Transformation of data from entomological field experiments ` > 





















































0-15 








. 88598 ΕἸ ΓΕ 


Ò ὦ κἩ τη γα NA σὰ σὰ A σὰ 


ccc 


ESS 98 


aucenoddos 





HDDEDACEEI TII 


ο ος εσας 


. 


ret τ c HOO 
EE] CERNS 


€5 c5 c ςὉ GD ED OD CD OD ος 


L] 

^ g | 388233448223 ESSSSSSNSS SRRHRSSSES  SISSSRAIGRB 
τ Ὃ ρα m e rA μισο ANNA ANANA NNNNA ANANN NANANA NANNAN 
* 3 | ZIBIR φ8 ως SESIISISSS SESSSSSSSE 
o OO 8 orm md m m mmm HNN NN NANNAN C CN NANNAN NN ANNAN NNN NAN NANN 

| T t δι T - 
5 SBRISESSES δι ραπ BBSSSSSELE LSSSSSBZaSS 
| 9 | 3889828823 2588939959 I2rc7tO9259 SIÓISSSS2222. 
i v2: OSM ccc a sie Mae CN CQ CL ON CN CC ON CQ NAAN 6 σῇ 60 CD CD OD 
S | S8NS5SSS25 RSFJ POPSSSEZÓE EEEE 
5 SRERSSARE GIBSSERSSS SSSSAARRAR SSSSSSSSES 
e deoa AAAA AA σὰ σὰ A A NOD σῇ ζω σῷ mN OD 602 062 OD CD 60 GYD CFD θῷ GYD OF OD ζω OTD 
g SASERLARS SSLLSSTSe SSSSERRSLSS SeeRessser 











ΟΝ ΣΕ 
d ο σσ 






2938999 
OD σῷ οὐ ος «eH ψ πὶ oH πὶ τὴ 














$98952-895RP 


O Om rr NANA N 


SSSHASSESS 


AN N OD OD OD OD 60 OD 06 OD 





29999192925 


Ò ὃ γη η e A σὰ A A σὰ 


—— 


SQQ 


ποσο tag 











SSFESARSSS 


OO emt mm AANA 


SA SS 


S D aS D ea h a a o 








e er Ron 
fp PUES D USAID ES CS 
O oe mA rH AAA NI 


adeSESSSA 
Sano τους 








LISSLESIA 


ος C 60 OD c cO H ᾧ HH 











Q or N MN HIS qo r- 00 0S 


SS 


BRXAARRARS 





ο μα 251 





289522928945 


σὰ οἳ σὰ ITE 


ΓΕ» ον m 


SEnLRRRADÉ ES3235958 


£6 69 OD OD 69 C3 € GD OF en OF c3 c5 oh c» c5 ον 65 ον OD 





NN e$ 15 © 
RERERESSSSS 
NNANNANAAAA 


SSSSSIS5RA asses σ) Εἳ τή το RS 
AANA £0 oD 6 


Nod oo iy IE 5855 50ο 
oh 65 c5 65 MOMMA 65 05 03 05 65 CD ο ο ος 





τες gare TEPERT 








ΟΣΕ 
€^ ci» CD CD Ον CD στ c5 co o» OD 


2993929 SRR $ai2g59z2r-gd 


€5 c3 6 65 c5 ον on on 6» OF ον co co co did d 





SESSSSSIYSW 


£^» c3 65 ch c5 c5 ον e» c5 c» CD 


ret Hh 


ος 8 EZZa8S8293985 s:8crstsosg 


6D 6D OD CD 0» OD c» οσον E oU HERE HH SH HHH He HH τμ 





SSeSsSessreer 
iQ 19 «p ep © io D EO EO E> 
OD ζῇ GYD σῇ 6 GYD CFD GYD CTD ζω σο 


© 


BRSSISISAS ἘΦ 9 5505955 ΕΛΕΓΕ Ες 
ο Ὁ XM M RON Ho RO Koo ooo aged ιδιδιδιδιδιδιδιδιδιῶ 





Επ σας SQASEYSESS SOÓS2229521 
ή RH ο ο eH Ἡ cH dH oH 1919191991 19 16,16 ιο X 1 X 1 15 X5 «D «D «5 qo 
2225252822 33395229528 Sss99s2998 
-«ώπην«ιδιδιδιδιδ 15 1H 15 15 H 15 co e co co «5 dq 5 «5 d d uud ou 
ERRE 











RESSESSSSR SESSSSSASY GESERSSSSA 
1δ 151515 1H 1H ιδ DOO Φὢ o Φ ΗΕ Ε ΕΕ , SHEET r > > Ow 





885252985939 


BESSURSBBS ρα δι 435529 SARSISSSE 


DLL e o doodo:0 eorrriir-pi-r- ασ ος 





T m © 00 


ZA38P2228 GSSSARGESS SSAASISSEL 


eo 
c 
οφώθωώώθώτι REE E00 0D 00 00 00 00 ey OOo) OR On Cn ος 


4... 

















SLSSAESSVVA Seg 2853 82 si224u9254 SuSses5tasc 
AQ MO MO 3 1 x5 dq» (D ο.» (CD [I gr P ὦ DDODRARAARDOS OQ OQ Ce sas eS 
ed te et et 

- į > 
RSPSSLLSSSS YLSSSSRSES 95399502903 BBHSeNgesa 
μη q— reb pei pei κ pej paa pef πε ή geb pi peg poi p mA pong om paf pmi 





ΕΙ 





BS3ER 





Biometrika xxxn 





252 Transformation of data from entomological field experiments 


In the special case where there are two plots for the ith treatment (i = 1, ...,) 
in the jth block (j = 1, ..., N), and so two observations, 2,5, ANG Zija the egte 
of the mean will be written 2,, and of the squared standard deviation 


. : δῇ = du Biga). - (18) 
Then from (9), we estimate k from i 


n» N n Ν n N . -1 
k= al Σ È (2y tia) ΣΣ (int typ) | ΣΣ (en +2] (14) 
i-lj-l imijel i=1j=1 ] 
and the calculation is very light. 


5. RESULTS SHOWING THE EFFEOT OF THE TRANSFORMATION 
ON THE VARIABILITY OF DATA” 


‘The adequacy of our proposed transformation may be judged in two ways: 
first, with respect to its effect, which we shall consider in the present section, on 
the differences between repetitions of a treatment within a block, and secondly, 
with respect to its effect, which we shall consider in $6, on the behaviour of the 
quantities submitted to the analysis of variance. 

It is a fundamental assumption in the analysis of variance that the chance 
variability for each plot shall be, when the effect of block and of treatment are 
removed, normally distributed with a standard deviation common to all plots, 
in which situation of course the standard deviation of the chance variability for 
a given plot i is independent of the expectation for that plot. In the data of the 
present work, where each treatment is repeated in each block, it id possible to 
. examine the estimates of this standard deviation, s,,, and of the expectation, £. 
For a clear graphical illustration of the situation consider Fig. 2, as obtained from 
the original data of Experiment III on Leptinotarsa decemlineata, where 8,, is 
plotted against z,,, and contrast this situation with that obtaining for the 
corresponding quantities s;; and x, obtained after transformation (k = 0-08) 
in Fig. 3. 

In Fig. 2 the points are widely scattered as is natural from a sample of two; 
nevertheless, it is apparent that for the smallest values of z;, the values of δ), 
` are correspondingly small and fall in a close group. In Fig. 3 the cluster of obser- 
vations in the lower left-hand corner of the previous diagram has disappeared, 
and generally the scatter appears to be independent of xi, so that apparently 
the transformation gave satisfactory results. The nature of the material involved 
is such that it does not seem possible to examine the relationship under con- 
sideration more exactly, nor to summarize exactly the corresponding results for 
the other treatments; it can only be said that the same type of result appeared 
. although the magnitude of the relationship before transformation’ depended on 
: the magnitude of the differences between the effects of treatments. 

The results shown in Figs. 2 and 3 suggest that the proposed transformation 
' has tended to makethe standard deviation independent of the mean, in accordance 


΄ 


GEOFFREY BEALL Ὃ. 253 


with the assumptions underlying the analysis of variance. In using this procedure 
one actually assumes, more broadly, that a common standard deviation exists, so 
that the homoscedasticity of observations before and after transformation should 
be tested. Thus it is assumed that x n and Zija are observations from a normal 





0 100. 200 300 400 500 


Fig. 2. The standard deviation and mean as estimated from plots by pairs, 
, with untransformed data on Leptinotarsa decemlineata. 


9 





Fig. 3. The standard deviation and mean as estimated jdn plots by pairs, with the EE E 
data on Leptinoidrsa decemlineata Say, i.e. using α΄ = k sinh-t (ke), (k = 0:08). 


population with a standard deviation, σ, which is independent of i and j. Then 
9. 
| E (£y — 2 E is distributed as x? with one degree of freedom.t Accordingly, 
k=1 ` : 


T In the L test, discussed by Nayer (1936), this case of estimates of standard deviation with 
one degree of freedom is troublesome since zero values tend to arise when dealing with grouped or 
integral observations. When this is the case Γη, which is the ratio of an arithmetic to a geometric 
mean of sums of squares, cannot be calculated. The έως treatment may therefore have a wider 
application, i 


17-2 








254 Transformation of data from entomological field experiments” 


. Yy = (Zy — tøa)/ 420 should be distributed normally with unit standard devia- 
tion for alli and j. In order to test the hypothesis of normality with unit standard 


deviation it is only necessary to test for leptokurtosis; for thé distribution must ᾿ 


be symmetrical since the sign of differences, and therefore of y;;, is a matter of 
chance.” Since the number of items involved will almost certainly be -< 100, and 
since the population mean is zero, the w, criterion of Geary (1935) will provide an 
appropriate test. In using this oriterion we must find the ratio of the mean 
deviation to the standard deviation, i.e. , 


Wy = [ix x Lal] [n 3 Σ XC. 7 (15) l 


΄ 


Of course, values of w, din be caloulated for transformed data by substitution 
of 245, for Lijk Ρ 


‘Table 9. The w, teat on the homoscedasticity of counts by plots. 
` within a block for six field experiments 


















































Ἴ Ttansformed i | - Transformed 
Untransformed (Bartlett) Value of & - (Beall) 





fonte limit, | aoe Departure a Departure} Esti- Em- d Departure 
- Wa by 8.D. ^ by s.». | mated | ployed ^ by 8.D. 


0-841 0-757 0-6659 ~ 5-33 07525 -1-91 0-078 0-08 07846 — 0-64 
0-841 0-757 0-7885 - — 0-49 0-7807 - 0-79 0-046 0-04 0-7499 -201 
0-866 0-737 0-5873 - 6-33 0-6431 -415 0-084 0-08 0-6838 -8:11 
0-872 0-732 0-5692 |, -5:67 0-6948 -2:67 0-285 0: a 0-7370 -1:06 
0-881 0-728 '| O-7554 -1-16 0:8155 0:15. | -0:082 0-0 0-7823 —0-58 
0-857 | 0-745 0-7959 ~ 0-22 0-8166 +0-38 2 019 0-02 0-8130 +0 28 





Values of w,„ from the untransformed observations and from the transformed 

_ Observations, both following Bartlett (ie. the transformation (z--4)!) and 
following the line "suggested in the present paper, are shown in Table 9 for the 
field data of Tables 1-6. For the second transformation the value& of k as calculated 
~ from (14) are shown as well as the nearest value of k entered in Table 8. For 
‘each experiment the value of nN and also the 0-05 limits of probability, from 
Geary (1935), are shown. There are also shown the departures of observed w, 

from the expected value in terms of the standard deviation, a useful criterion 

since the distribution of w, is almost normal. From Table 9 it oan be seen that 

out of the three experiments in which w, fell beyond the lower ὅ % limit of 

probability for the untransformed data and the data transformed as (7+ 4)}, 

.in only one experiment did w,, fall so with the final transformation. The results 
for Experiment II, in which 10, is decreased by the transformation, are peculiar. 


Consideration of the departures from the mean in terms of the standard devia- ` 


tion indicates more clearly the improvement effected by each transformation 


€ 


x 


GEOFFRÈY BEALL 255 


and how-the transformation suggested in the present work secureg an improve- 
_ ment of the same, but more marked, character than that secured from the 
transformation of Bartlett. The results suggest that while homoscedasticity may 
not be attained always, it will be approached by means of the proposed trans- 
formation. & 


0. THÉ EFFEOT OF THE TRANSFORMATION ON THE'ANALYSIS 
OF VARIANOR: 


As was indicated at the beginning of $5, our proposed transformation besides 
making the variability within a block for a repeated treatment the same for all 
treatments and blocks, should also provide quantities satisfying the assump- 
tions underlying the analysis of variance. Since it is not quite clear how, in so far 
as the transformation is satisfactory in the first way, it will necessarily be satis- 
factory in the second, it will be well to consider directly the suitability of our 
transformed values for the analysis of variance. 

Τη the application of the analysis of variance one would disi with z;, rather 
than with 2,,, and suppose that 

ty, = A + B, +0, Di; © > (16) 
where A is a contribution from the general level of population on the experi- 
mental area, B; the contribution of the ith treatment and C; the contribution of 
the jth block. The remainder term, D,,, is called the interaction of treatments and 
blocks. Of course, the present discussion on the untransformed values, z,,, 
holds for the transformed values, α΄; = 4(zx,,--2,,) when the appropriate 
symbols, A’, Bi, C; and Dj, are used: 

In material satisfying the conditions underlying the analysis-of variance, for 
the observations under each treatment, the calculated squared standard devia- 


tion is i ι N ` 
8p FACT) . (17) 


Following the argument of the analysis of variance, z,, —2, , of which the mean 
is 0, is an estimate of C,+.D,;, in which the two terms are independent; hence the 
expectation of sj is σὲ = σὲ, +03, (18) 
where Foz and op, are the standard deviations of the parameters, C; and D, 
respectively, and are independent of treatment. Accordingly, &, should be in- 
dependent of treatment and distributed as an estimate of o; having N — 1 degrees 
of freedom. Conversely, if z;, cannot be built of the independent terms of (16), 
then the various values of s, will not be distributed as estimates of a single 
standard deviation. The hypothesis that the values of s; in any one experiment 
are estimates of one quantity may be tested.T 


T From correspondence with Dr R. W. B. Jackson, the writen has learned that he had arrived 
independently át the same test. ; 


^ 


256 ` Transformation of data from entomological field experiments 


The results of the tests on the homogeneity of the values, δι, within the, six 
experiments treated in the present paper are presented in Table 10, where the 
value of the Z4 criterion is shown for the original data and for the transformed 
values together with the appropriate 0-06 and 0-01 levels of probability (Nayer, 
1936). From Table 10 it can be seen that of the values of L, obtained from the 
‘original data, all but one are near or beyond the 0-05 level of significance, but 
that after transformation all are moved in to less significant values. Accordingly, 
the values of s,, when calculated from the original data, appear heterogeneous but, - 
the corresponding values obtained after the transformation appear homogeneoug. 
Thus it is more probable that the analysis of variance is applicable to the trans- 
formed data than to the untransformed. 


΄ 


= - / 
Table 10. The homogeneity, as measured by the criterion L,, of the estimates 
i 8, for various values of $ before and after transformation 





s Experiment 










L, before transformation 
L after transformation 
1% limit | 

5% limit 





. . In Table 10 we have tested the homogeneity of the estimates, 8;, as in §5 we 


. tested the homogeneity of ϐµ, that is without reference to the values of the 


associated means. In view of our original assumptions we are, however, interested. 
in the possibility that the standard deviations, as calculated, might show every 
sign of being estimates of a common standard deviation and yet be dependent 
on the associated means. Accordingly, we have investigated such dependence 
roughly by fitting by least squares a first order regression of s, on ας. From this 
fitting we record the sign of the regression as follows: 








Before transformation - - 
After transformation 








GEOFFREY BEALL "e * 957 


By a single asterisk we have indicated cases where the reduction in variability - 
effected by the regression passed the 5% probability limit and by a double 
asterisk where it passed the 1 % limit. Several points may be noted. (1) In two 
cases (Experimente 2 and 6) after transformation the residual sum of squares 
about the regression was greater than the reduction in squares due to the regres- 
sion, whereas it was consistently less before transformation. (2) As can be seen 
above, the regression generally did not effect a significant reduction in variability 
after transformation but did before (the small number of degrees of freedom made _ 
high significance difficult of attainment). (3) After transformation the sign of 
the regression seemed to be a chance matter, whereas before transformation it 
was consistently positive. These results suggest that the transformation pro- 
posed did tend to make the variability within a given treatment EE of 
the mean for that treatment. 


7. THE EFFECT OF TRANSFORMATION UPON THE CONCLUSIONS 
FROM THE ANALYSIS OF VARIANCE 


It has been shown in §§ 5 and 6 that the analysis of variance can be made on 
entomological data when a suitable transformation has been effected. It is of 
practical interest to see what numerical effect such transformation will have upon 
tests on the significance of, say, the effect of treatment and the significance of 
differences for treatments. f 

First, consider the numerical results'to be obtained from the analysis of 
variance (1) without and (2) with transformation. Thus the mean square ascrib- 
able to blocks, treatments ahd their interaction is shown in Table 11, for six’ 
experiments of which the data are given in § 2; parallel results are presented for 
untransformed observations and for observations transformed by (10) with the 
values of k from Table 9. To facilitate the comparison of the results, the mean 
square for blocks and for treatments is expressed in terms of the estimate for 
interaction, as the F of Snedecor (1934), and presented in each case. The trans- 
formation of the data has modified the conclusions to be drawn from the analysis 
of variance in Table 11, in that there are considerable changes in the criterion, F, 
for treatments or for blocks. In the examples shown the effect of treatments was 
highly significant in all cases and so the changes introduced by transformation 
did not alter the conclusions, as would have been the case for less definite effects. 

Consider next the effect of transformation on the significance of differences 
between the means for treatments as tested by the criterion, £, caloulated with 
such estimates of mean square as the interaction of Table 11. For illustration, 
valués of t, from the data on Leptinotarsa decemlineata (Experiment TIL), are 
shown in Table 12 for each possible comparison of treaments when untransformed 
data are used, when the transformation, (x + $)*, as suggested by Bartlett (1936a) 
is used, and when the transformation, k+ sinh-! (kx), as suggested inthe present 
paper is used. In order that the influence of the level of population under each 


. 


Table 11. The analysis of variance of untransformed and ο 
transformed data in six experiments - 

































































A 
Untransformed date Transformed data | 
Variation ΄ i 
δ. wd E i Experiment I. P. nubilalis . 
Between blocks 9 92-8 1-21 0-565 1-66 
Between treatments 6 2,839-0 966, 7-51 22-03 
Interaction 54 76:8 — 0-341 — 

; ES Experiment II. P. nubilalis : 

- Between blocks 9 577-0 an . 5:37 10-55 
Between treatments 6 1,721-0 8-69 17-07 
Interaction . 54 86-9 0-509 — 

Experiment ΤΠ. L. decemlineata 
Between blocks 6 20,172-0 2-26 4-67 3-03 . 
Between treatmenta . 3 390,932-0 43-77 - 11L6 72-16 
Interaction ~ 18 8,931-0 ---, 154 + — 

Experiment IV. L. decemlineata e 
Between blocks ^ 6 12,960-0 2-06 206 -~ 196 4 
Between treatments ` 3 124,054-0 20-40 20-40 19-41 
Interaction ' ' 15’ 6,727-0 — 1-05 — 

Experiment V. P. quinquemaculata 
Between blocks 5 274 - 2774 0-349 419 . 
Between treatments , 2. 752-0 75-33 19-6 233-77 
Interaction 10 9-98 — 0-083 — 
E — VI. P. quinguemaculata : 

Between blocks 41:7 * 15 1:50 412 
Between treatments - : 60-2 3-33 9-14 
Interaction 26 . 10-1 0-365 — 


- Fable 12. The values of ἐνὶ νι. the comparison of means, as calculated. from the l 
untransformed and the transformed data of Experiment III on Leptinoratps 

















decemlineata 

. ' Means for x - ΄ 

' : t without t from t from 
„Comparison B πα transformation (+4) k- sinh (kx)à 

σα, Ts., 302 30 — 49-2799. +11-33** - + 992** 

% Ta.. 362 224 +3-84** + 3-52** + 9114 

9j, Ta.. 362 11 . -L9:82** + 12-89** + 12-46** 
ΚΩΝ n zn reo - 7-81** -- 781** 
Ta. Ta.. 0:54 - '+ 556 + 2-54* 

Wy ας. 294 1 -5-08** + 9:81». | 10: 








GEOFFREY Έπατα, αν 259 


treatment may be judged, there are shown, also in Table 12, the means for the 
untransformed data. The values of ¢ falling beyond the 0-01 level of significance 
have been marked with two asterisks and the values beyond the 0-05 level with 
one. It can be seen that the transformation resulted in a profound alteration in 
the conclusions. Apparently on account of the dependence of variance dn mean 
in untransformed data, the pooled estimate of variance was originally too low 
for the treatments which resulted in high populations and too high for the treat- 
ments which resulted in low populations. Thus, in the comparison of the first 
and third treatments, which appeared to have the two highest surviving popula- 
- tions, the value of t calculated from untransformed values was high. In the other 
extreme case, the comparison between the second and fourth treatments, the 
value of t, as calculated from untransformed values was very low. It can be seen 
further, that the first transformation only secured in part the modification in 
the value of t that was secured by the second transformation. vs 


8. THE PROCEDURE OF TRANSFORMATION IN PRACTICE 


The methods which were found applicable in the preceding discussion will 
now be illustrated in the transformation of the data shown in Table 7 (Experiment 
VII) on Phlegethontius quinquemaculata, of the same type as the experiments 
` -previously discussed in the present paper. The steps in the analysis will be set out 
with the purpose of providing a model for procedure in estimating the constant, 
k, which will be used to effect a transformation of the data so that the analysis 
of variance may be made. 

Supposing that the experiment has been laid out with a repetition of each 
treatment in each block, the procedure of estimating k makes it first necessary to 
find the sum and the absolute difference of each pur of τος subjected to a given 


treatment in a νά, plot and then to sum the sums, Σ Σ in t KA: and the sums 
squared, Σ Σ (μι +239)", and also the differences τη πο Σ Σ (ras —449)*, 


over dieu pairs and by substituting the resale: in (14) to find b Τα the case 
being used for an illustration the two plots subjected to the first treatment in | 
each block gave respectively 10 and 7, 20 and 14, 14 and 12, 10 and 23, 17 and 20, 
14 απά 13, so that 


΄ 
» 


N ` ͵ 2 
X (tyn + ta) = (104-7) + (204 14) - (14 - 12) - ... = 174. 
= 
Similarly, 

N 

Σ (Zy HTa)? = (10+ 7)? + (20+ 14)? +... = 5308 

-1 
and similarly, 


μον (10— 7)? + (20 — 14) - ... = 228. ] 
j-1 ; t 


260 Transformation of data from entomological field experiments 
Of course, in estimating k the summations are not limited to orie treatment but 
must be extended over all in the expetiments. If this is done we find 


n N. aN n N ; " 
Σ D (Cyr + Zia) = 684, . Σ D (Ziy t Tiga)? = 19,656, Σ Y (Ziy — iga)? = 708. 
ὑπιήπι i ν / i=1je=1 {11.1 


2(708 — 684) 


. From (14) we estimate k= 19,656 


= 0:002, 


| and referring to Table 8, p. 250, use k = 0- 00 as the nearest value occurring there. 


Οἱ course, in this case, the transformation is simply z?. 


Now from the above result it will be possible to replace the observed values of . 


Table 7 with the corresponding transformed values from the first’ column of 
Table 8. Thus in Table 7 replace in the first row: 10, 20, 14, 10, 17 and 14, by 

᾽ 3:16, 4:41, 8:74, 3-10, 4-12 and 3-74. With such transformed.values we can now 
proceed to carry out 8 routine analysis of variance which will be facilitated by 
working with the sum for each pair of plots in a given block with a given treatment. 
For example, the final analysis of variance for Experiment VII would be carried 
out with the values of Table 13. 


Table 13. Transformed and summed values to be used in the analysts of 
variance for Experiment VII on P. quinquemaculata - 


` Treatment 


oa ἡ. ο be 




















9. Summary AND CONCLUSIONS 


The foregoing work is a study of experimental results from seven field experi- ' 
ments on, the control of insects. In such data, the standard deviation of the 


number of insects per plot varies with the mean. By the transformation, 
x’ = k sinh! (kx), where k is a constant and x an observation, the data were 
put in a form for which the standard deviation approached a constant independent 
of the mean. The estimation of the one constant, k, necessary for the transforma- 
tion was made possible by the design of the experiments with repetition of treat- 
ments within blocks. In practice, the transformation gave good results so that 


t 


4, 


GEOFFREY BEALL 261 


analysis of variance could be made. From the analysis of the transformed data, 
the results were found to differ markedly from those which would have been 
obtained from the untransformed data. ` 


ACKNOWLEDGEMENTS 


The present work was suggested to the writer by Prof. E. S. Pearson to whom 
the writer is further indebted for advice as the work progressed. The writer is 
` further beholden to Dr R. W. B. Jackson, to Dr B.L- Welch and to Dr M.S. 
Bartlett for criticism and advice. Dr G. M. Stirrett and Mr F. A. Stinson very 
kindly permitted the use of unpublished data from their experiments carried out 
at Delhi, Ontario, on Phlegethontius quinquemaculata Haw. 


V 


APPENDIX 


As has been said, the transformatión of (10) was suggested by the method used 
by Tippett (1934, p. 61). The procedure is as follows. 

It is required to find’ z' = f(x), such that tlle standard deviation, σ,» of 2’, 
shall be approximately constant. Let us write 


OD) ef (M) α-- M) (19) 
where M is the expectation of x and whence, approximately, 
| (α΄-- M^) = ffs — M), (20) 
Eos MM’ is the expectation of z'. Hence 
=(P, : | (21) 


where o: is the standard μά. of the observations, r. Replacing ση’ in (21) 
by a’constant, c, as is the purpose of our operation, and substituting for σ from 
equation (6), p. 247, we have 
} (4) = ο(Μ +k), (22) 

where k is, as has been previously discussed, a constant peculiar to our data. 
Integrating in (22), 

ΓΗ) = 2ck- sinh- (k MY. (23) 
From (23) the form of the function suggested is sinh (kx), but it is wise instead 
to use k sinh—! (kr), since the transformation then becomes identical, as shown 
in (11), with the established transformation, zt, when k = 0. 

As Tippett (1934) says: ‘This derivation is not mathematically sound, and 
the result is only justified if on application it is found to be satisfactory.’ The 
writer would have hesitated to have used it had it not already led to useful 
transformations in cases analogous to the present, namely to x? where z comes 
from a Poisson distribution, to sin-1pt where p comes from a binomial distribu- 
tion and, according to Tippett, to tanh-ir, where r is the correlation coefficient. 


262 Transformation of data from entomological field experiments 


] REFERENCES 
Bantierr, M. S. (1936a). Square root transformation in'analysis of variance. J. Roy. 
Ν Statist. Soc. Suppl. 3, 68-78. - 
----- (19860). Some notes on insecticide tests in the laboratory and in ‘the field. J. Roy. 
Statist. Soo. Suppl. 3, 186-94. i 
Βπατα, G. (1935). Rudy of arthropod. populations by the method of sweeping. Ecology, 
16, 216-25. 
—— (1939). Methods of estimating the opos of insects in a field. Biometrika, 
30, 422-39. 
----- (1940). The — ος of data from entomological field a Canadian 
-Ent. 72, 108. 
Bair, Gi, Stmrerz, Q. M. & Connmms, I. L. (1939). A feld aperies on the control 
^ ofthe European corn borer, Pyrausta nubilalis Hubn. , by Beauveria Bassiana Vuill. n. 
* * Ser. Agric. 19, 531-4. 
BxOxxm, Ω. F. & Van OnsTRAND, C.-E. (1931). Hyperbolic functions (4th reprint). Smith- 
. sonian Mathematical Tables. 
CrapHam, A. R. (1986). Over-dispersion in grassland communities and the use of statistical | 
methods in plant ecology. J. Ecol. 24, 232-51. 
Gmary, R. C. (1935). The ratio of the mean deviation to the standard deviation as & test 
of normality. Biometrika, 27, 310-32. 
ΝΑανπα, P. P. N. (1936). An investigation into the application of Neyman and Pearson’s 
L, test, with tables of percentage limite. ‘Statist. Res. Mem. 1, 38-51. 
ΒΝΕΌἘΟΟΒ, G. W. (1934). Caleulation and Interpretation of Analysis of Variance and 
Oovariance. Pp.-96. Ames, Iowa: Collegiate Press Inc. 
‘STUDENT’ (1919). An explanation of deviations from Poisson’s law i in practice. Biometrika, 
12, 211-15. 
Tærerr, L. H. C. (1934). Statistical methods in textile eai Part 2. Uses of the 
' binomial and Poisson distributions. Shirley Inst. Mem. 13, 35—72. 
Wiuunraws, C. B. (1937). The use of logarithms in the interpretation of certain entomological : 
problems. Ann. Appl. Biol. 24, 404-14. 


” 


-INTERPOLATION FOR FRESH PROBABILITY LEVELS © 
BETWEEN THE STANDARD TABLE LEVELS 
OF A FUNCTION - Ἢ 


' By J. B. SIMAIKA - 


1. THB PROBLEM 


A NUMBER of tables of probability functions exist, and more will no doubt before 
long be available, giving values for a variable x corresponding to a limited number 
of simple probability levels α. How far is it possible to obtain x apy for inter- 
mediate values of αἲ 

The variable may be put into standardized form as the ratio of the deviation 


from the mean to the standard deviation; these two quantities (1.6. the mean and ΄ 


standard deviation) are often easy to obtain, whereas the probability integral 
may require extensive computation. Denote by u, the standardized variable 


so that 
u Za — MEAN x 


* ^ standard deviation of z` 


(1) 

The question we shall consider is this: having full and accurate tables relating 
u, and a for a standardized normal variable, denoted by U,, can we use these 
values as auxiliary in obtaining u, for any other function tabled only at a few 
probability levels and, having found aninterpolating formula, what isitsaccuracy ? 
In examining this-point we shall compare the accuracy of the method with that of 
some other methods of deriving intermediate values of w,. 


2. GENERAL APPROACH 


It will be useful to consider first how far & general theoretical approach will 


take us. Let the variable z follow a probability law defined only by its cumulants _ 


K, (r = 1,2,...); then the first four cumulants of the variable u become 0, 1, γι, γε. 
It is known that the relation between u, and α may be written symbolically 


um NL de ] -e 
ποσο iat e A? dz, (2) 


while the same relation for a normal variable is 





Us. 1 ο 
erea 9 


Using equations (2) and (3) it has been shown by Cornish & Fisher (1937) 
that u, can be approximated to by a parabolic curve in U, and vice versa. If we 


4 


4 


264 'Interpolation for probability levels: 
assume that x, for r > bis negligible, this expression, using a fifth degree parabola, 
is u, = A+ BU, + CU2+ DU3+ EU4+FUS, (4) 


where - A= -γι-ήεγιγε!-φοεγῖ, 
Pe i B=1-}y tón -γὲ -δέγϊγε- AY, 
C= byt AY -aydo . 
D= Hy- Ao +A% -pirin tiA. 
E=, -AVY +Y 
F= —1mYi Ἡτ[αγξγε--εἴογί- 


Now as y; and y, tend to zero, B tends to unity and all other coefficients tend 


' to zero, i.e. the curve of u, as a function of U, tends to the diagonalline . - 


u, = U,. : 


. Figs. 1-4 give diis curves for different numbers of degrees of freedom for the 
-commonly used statistical variables x?, x, t and v (a transformation of z referred 
to below), expressed in standardized form. 
With regard to the coefficients in (4) it may be remarked that large values of 
ya do not-increase them as much as large values of y,. Furthermore, when y; is 
zero, the coefficients 4,6 and E vanish and the expression (4) becomes 


` u, = BU,+DU3+FUS, 


e 


or l Té = B+DUZ+ τί. . (5) 
These broad results suggest that a good method of interpolation, when both γι 
and y, exist, is a Lagrange formula through the points 
(ta Ua) 6-,2,... 
, Remembering that Un, = (£a, mean z)/(s.D. of x), from the practical point of 
view the interpolation can be carried out more expeditiously and without loss , 


. of accuracy by using a Lagrangian formula through 
(ta Un) ο οὐ: 


. When, however, γι is zero as in the case of the ¢-distribution it would be better to . 


take the Lagrange formula through the points 


Un’ Uz, 
` the mean z being zero. 

The accuracy of the method is likely to depend on the value of γι and Ya 
For example, as seen in Fig. 1, linear interpolation between, say, Uzos and Ups 
„will be more accurate with ν = 18 than v = 3, the’ y’s being smaller.in the former - 
case. Again, the curves will be more nearly linear if we take as variable 


(v: σε) or alternatively through (Fe. v2) (dog y 


u=(x~meanx)/o, rather than u= (x?— mean χϑ)/σ,» 
_ because the y’s for the former are the smaller. 


yA 


4 


J. B. SIMAIEA 265 


For the practical worker it will often be sufficient.to use linear interpolation, 
1.9. to make use of two tabled probability levels only. For more accurate work 







to 1-5 20 2:5 
. | 
Upper End 
of 
! Distribution 43-0 


2.0 


2:5 


if 


Standardized ;°-variate, u 
ΤΟ £ 
e 


1-0 AU UBS 20 25 
Standardized normal variate, U _ 
Fig. 1. Relation between u(x*) and U. ~ 


three or more levels can be used, but an increase beyond this is not in fact likely , 
. to lead to a gain in accuracy which ‘will be worth the labour. 3 


΄ 


266 Interpolation for probability levels 


3. CoMPARISON OF METHODS 
. To compare the accuracy of the interpolation based ‘on the polynomial | 
expansion (4) with that obtained by other possible methods, we have considered 
the χ», t and v (beta)-probability distributions. For each, different methods of 
interpolation have been devised. In some a transformation of the variable has 


1-0 15 2:0 ` 25 " 





Standardized x-variate, u 


Bm m 20 25 
Standardized normal variate, U 
Fig. 2. Relation between u(x) and U. 


` been used, while in others a transformation of the argument Or & ΗΝ new 
argument has been considered. 

The accuracy of each method in the range covered by the values a<0-10 

, and α 0-90 has been tested in the following way: between any two consecutive 

probability levels a number—not less than three—of intermediate values have 

been interpolated. These values were chosen to be those which could be obtained 

accurately from some other table. The greatest deviations in each interval are 


J. B. SIMAIKA 267 


given in Tables 1-3. The methods have been arranged according jo the degree 
of accuracy obtained. ; 


50 





[5 20. 






3-0 


25 


Standardized ¢-variate, u 
ty 
e 


1-0 BS 2-0 25 
Standardized normal variate, U. 
Fig. 3. Relation between u(t) and U. 


` 


Tt remains to point out that, if the n tabulated probability levels are denoted by 
&4 «s € ... «a, & 0-10, 
and if the interpolation is to be carried by using a quadratic or higher expression 


in the argument, it has been found that the interpolated value is always more 
Biometrika xxxi f 18 


268 Interpolation for probability levels 


accurate whep the probability levels used include as many as possible of the 


probability levels below «,. Similarly, if 
0-90 <a, « a € ... X, 
it is better to use probability levels including as many as possible of those above a. 


-20 -2:5 


— 1-0 -l5 







1 
N 
` 
wn 


Standardized v-vanate, u 


t 
- δρ 
e 


1-0 -15 -20 ᾿ -2:5 
Standardized normal variate, U 
Fig. 4. Relation between u(v) and U, where p(v) = Ovir (1 — virt, 


4. THE y°-PROBABILITY FUNCTION 


The standardized form of X? used in Fig. 1 is 


won = ES, (5) 


v being the number of degrees of freedom. The y; and y, of this distribution are 


~ J. B. SIMAIKA 269 


α(8/ν) and 12/v respectively. If we consider y itself instead of x? we find that its 
standardized form has the approximate value, used in Fig. 2, 


u(y) = WENO), (7 


and its γι and y, are (2v)-?+ Ο(ν-3) and O(v-?) respectively. Both these last 
quantities are smaller than those for y*, which suggests that interpolation will 
be more accurate using y rather than χ᾽, This can also be seen from Figs. 1 and 2. 

The x? probability levels have been tabulated by Fisher (1941) for a = 0-01, 
0:02, 0-05, 0-10, ..., 0:90, 0-95, 0-98, 0-99 and are given to three decimal places. 
These levels were used in the interpolation.* The accuracy of the interpolated 
values obtained by the eight methods detailed below was checked either from the 
Tables of the Incomplete Gamma Function (Karl Pearson, 1922) or from T'ables 
or Statisticians and Biometricians, Part I, Table XII (Karl Pearson, 1930). 

The greatest deviations, ô (m = 1, 2, ..., 8), obtained using in all eight different 
methods, are given for v = 3,5,9 and 18 and for intervals of a: (0-01, 0-02), 
(0-02, 0-05), (0-05, 0-10), (0-90, 0-95), (0-95, 0-98) and (0-98, 0-99) in Table 1, 


Method 1. | 
yi=A+Bloga (x«010) y%=A+Blog(1—a) (a 0-90). 
: Method 2. 
u(x) =A+Bloga (a<0-10), u(x) = A+Blog(l—a) («> 0-90). 


. Method 3. u(x?) = A+ BU. 
Method 4. i l l 
u(x?)— U = A4 Bloga (α«0.10), — u(X3) -U = A4 Β]οσ(1--α) (æ> 0-90). 
Method 5. u(y) = A+ BU. 
Method 6. xX = A+Blogæ+Clogřæ (a«0-10), 
] y? = A+ Blog (1—a)+Clog*(1—a) (α» 0:90). . 
Method 7. u(y?) = A+ BU+0U?. i 
Method 8. u(x) = A+ BU + CU*. 


The best linear interpolation is that provided by method 5 and the best 
quadratic one is given by method 8. Both these methods are those suggested by 
the general approach. 


* Certain additional levels are given in a recently published table computed by Catherine M. 
Thompson (1941). ` . 


Y 18-2 





270 | Interpolation for probability levels 


Table 1. Greatest deviations in the interpolation of x? and x. 















































] 
YoY γι Ya Interval | æ True x! |. |δι| [δα] [354 |δε| KA i | [δ 14] 
3 | 1-63 | 4:00 | (0-02, 0-05) | 0-02 997 0-245 | 0-006 | 0-007 | 0-011 | 0-012 | 0-004 | 0-003 | 0-003 | 0-001 
(0-05, 0-10) | 0-07 890 | 0-490 15 8 10 1 4 4 3 0 
(0-90, 0:96) | 0-92 810 | 7-000 4 3 27 14 ὄ 1 0 1 
(0-95, 0-98) | 0-97 071 | 9-000 b 33 36 10 6 0 0 21. 
(0-98, 0-99) | 0-98 827 | 11-000 1 18 84 ὄ 1 9 1 2 
5 | 1-26 | 2-40 | (0-01, 0-02) | 0-01 353 | 0-632 8 5 6 3 2 6 0 0 
(0-02, 0-05) | 0-03 340 | 0-949 -23 13 15 3 5 5 1 0 
(0-05, 0-10) | 0-06 160 | 1-265 19 11 | - 0 3 4 0 1 
(0-80, 0-95) | 0-92 808 | 10-119 11 32 26 12 5 3| 1 1 
0-95, 0-98) | 0-96 544 | 12-017 18 39 32 12 6 9 1 0 
0-98, 0:90) | 0-98 679 | 14-230 5 18 17 6 4 9 1 1 
9 | 0-84 | 1.39 | (0-01, 0-02) | 0-01060 | 2122 | : 4 2 2 2 0 1 v] 0 
0-02, 0-05) | 0-03 452 | 2-970 34 21 19|^ ο 6| . 6 0 1 
0-05, 0:10) | 0-07 706 | 3-818 33 24 17 6 6 8 1 0 
„| (0:90, 0-95) | 0:98 312 | 16-000 19 88 17 1i b -4 0 0 
" (0-96, 0-98) | 0-96 483 | 18-000 22 46 31 19 7 4 1 1 
| (0-98, 0-99) | 0-98 736 | 21-000 8 18 14 5| 4 2 1 0 
18 | 0-67 | 0-67 | (0.01, 0-02) | ‘0-01 167 | 7-200 14 9 5 9 1 4 1 1 
(0-02, 0-05) | 0-02.798 |, 8-400 41 32 20 2 (6 7 0 1 
(0-05, 0-10) | 0-07 482 | 10-200 | 48 34 18 5 6 11 1 0 
- | (0:90, 0-95) | 0:98 159 | 27-600 34 52 26 | -10 a) 7 1 1 
0-95, 0-98) | 0-96 255 | 80-000 34 55 28 10 6 8|, 1 1 
0-98, 0-99) | 0-98 589 | 33-600 15 26 13 5 4 3 1 1 














For methods associated with subscripta to ὃ, see p. 269. 


Table 2. Greatest deviations in the interpolation of t 








Interval “a 











(0-005, 0-01) |, 0-00 692 
(0-01, 0-025) | 0-01 490 
(0-025, 0-05) | 0:08 261 
(0-050, 0:10) | 0-07 286 


(0-005, 0-01) | 0-00 714 
(0-01, 0-025) | 0-01 517 
(0-025, 0-05) | 0-03 874 

.(0-050, 0-10) | 0-06 820 


(0-005, 0-01) | 0-00 025 
(0-01, 0-025) |:0-01 639 
(0-025, 0-08) | 0-03 844 
(0-050, 0-10) | 0-07 246 

















` For methods associated with subscripta to ὃ, see p. 271. 


¢ 


J. B. STMAIKA f 271 
5. Tun I-PROBABILITY FUNCTION ^ 
The standardized form of the t-probability function used in Fig. 3 is 


u(t) = uj (53), . (8) 


v being the number of degrees of freedom. The values of γι and y, are zero and 
6/(v — 4) respectively. 

Percentage levels for t have been tabulated by Fisher (1941) for α — 0-006, 
0-01, 0-025, 0-05, 0-10, ... &nd are given to three decimal places. The level here 
defined as α is half the figure given in Fisher’s table, i.e. 


a= [^ 20 dt. (9) 


The values of ¢ used in checking those obtained by interpolation are taken 
from Tables of the Incomplete Beta- Function (Karl Pearson, 1934), where 


L(p ; q) =a, : | 
j 1 , 
p= ἐν, q= $ and z= IFFR] 
v being the number of degrees of freedom. 
The greatest deviations found in the following eight methods are denoted 


by 6,, (m = 1,2, ..., 8), and are given in Table 2 for v = 3,6 and: 10 and for the 
intervals (0-005, 0-01), (0-01, 0-025), (0-025, 0-05), (0-05, 0:10). 


(10) 


Method 1. u(t) = A+ BU. 
Method 2. ut)-U = A+ Bloga. . 
Method 3. t= Á+ Bloga. 
Method 4. . u(t) = AU + BUS. 
Method 6. » u(t) = A+ BU + CU? 
Method 6. . t= A+ Bloga+Clog?a. 
` Method 7. u(t)-U = 4+ Bloga+ Clog3a. i 
Method 8. - u(t) = AU+BU8+0U8, Ἢ 


From the methods that make use of only two probability levels, method 4 
is the one that gives the highest accuracy and from those that make use of three 
probability levels, method 8 is the best. These two methods are those suggested 
by the general approach. i 


6. THE v OR BETA-DISTRIBUTION . ; 
This variable, which is related to R. A. Fisher’s z, is defined’ as 


s B. yg —— 
- S, 4-8, Patne? 








ση 


212 Interpolation for probability levels ` 


\ 
where Sf is a sym of squares of normal variates based on vy degrees of freedom and 
S, an independent sum of squares aid on v; degrees of freedom. The elementary 
‘probability law of v is 
= {Βνι t b): via I — vit, (12) 


x . Table 8. (opu deviations in the interpolation of v 














l 

Miva! Yi Ya Interval α Trev | |À|.| lôa] |δα| pt 14] |44 | 3, | 
4 |20| —0-92 | - 0-79 | (0-005, 0-01) | 0-00 838 | 0-52000 | 0-00043 | 0-00016 | 0-00008 | 0-00007 | 0-00006 | 0-00005 0-00008 
(0-01, 0:06) | 0-02 607 | 0-59000 296 76 34 35 7 .21 | 1 
(0-05, 0-10) | 0-06 901 | 0-868000 79 5 28 14 61. 10 1 
8| 90 | —0-01| +0-25 (0-005, 0-01) 0-00 769,0-53000 | . 25 4 7 16 9 5 `g 
ΟΙ, 0-05) | 0.02 711 | 0-59000 228 36 ' 24 35 4 13 I 
(0-05, 0.10) | 0-06 645 | 0-864000 07 4 16 16 2 5 2 
15 [50 | — 0-28 | — 0-12 | (0-005, 0-01) | 0-00 666 | 0-41000 31 3 8 - 8 9 . 7 8 
(0-01, 0-05) | 0-02 826 | 0-47000 240 40 22 42 3 9 3 
(0.06, 0-10) 0-07 463 0-52000 78 9 1 20 1 θ 1 
8 | 15 | —0-33 | —0-26 | (0-005, 0-01) | 0-008677 |0-30000 53 19 20 13 8 3 3 
(0-01," 0-05) | 0-02 619 | 0-37000 418 128 114 67 11 22 . 2 
(0-05, 0-10) | 0-06 972 | 0-44000 126 27 16 29 2 16 3 
(0-005, 0-01) | 0-00 58 5 5 
(0-01, 0-06) |0 534 19 12 
(0-05, 0-10) 144 8 11 















































For methods associated with subscripts to ὃ, see p. 273. 


Low values of v correspond to high values of z. The standardized form of v 
used in Fig. 4 18 








ilg 
= Vi t Vg | 
uo) πρ e Tezy a3). 
ΡΕΜΑ δη V3 2 
and the values of y, and y, are : 
"A V, — Vg [Cer 
DO HE © Ving ] (14) 





: ο κ y, m PAPE + 2) + vig + 2) 7 2v, vs (v, vs 4). 
: a νι να(νι + Vg + 4) (V, E να +6) 
The cases considered here are those most generally met in tests of significance 
with νι < v, and therefore y, < 0. γε is sometimes positive and sometimes negative. : 
` Tables giving values of v for «=0-005, 0-01, 0-025, 0-05, 0-10, 0-25 and 0-50 i 
to five decimal places have recently been published (Catherine M. Thompson, 
1941). The corresponding upper probability levels can be obtained by entering 
the tables with'v, and v, transposed and taking 1— v for v. ΄ 


` 


J. B. SIMAIKA 273 


The values of v used in checking the interpolation are taken from the Tables 
of the Incomplete Beta- Function (Karl Pearson, 1934). 
The greatest deviations ôm (m = 1,2,...,7), found in the following seven 

methods are given in Table 3 for the following pairs of values of νι and »,: 
(4,20) (8,30) (15,30) (8,15) (4,5).and for the intervals (0: 05, 0-01), m -01, 
0-05), (0-05, 0-10) of a. 
Method 1. $ 
v= oy Dga (ας 0:10),  v=A+Blog(l—a) (a>0-90). 

Method 2: 


u(v)- U = A+ Bloga (æ < 0-10), u(v)-U = A+ Blog (1—a) (a> 0-90). ` 


Method 3. u(v) = A+ BU. 
Method 4. v= A+Bloga+Clogta (x« 0-10), 
v= 4+ Blog(l1—a)+Clog*(l—a) (α» 0:90). 
Method 5. , u(v)—U = A+ Bloga+ Clog?a (a < 0-10), | 
u(v)—U = A+ Blog (1—a) + Clog*(1—4) , (x> 0:90). 
Method 6. wu(v) = A+ BU + CU?*. 
— Method 7. - u(v) = A+ BU + 0U?+ DUS, 


Here also the best linear method and the best quadratic one are those sug- 
gested by the general approach, namely, methods 3 and 6. Method 7, a cubic in 


U, was used only because the great accuracy of the tabulated v-function justified - 


the computation. This cubic interpolation giyes errors of the order of 0-0001 
even for numbers of degrees of freedom as small as v, = 4, v, = 5. If the interval 
between a = 0-01 and 0-05 were broken into two parts at 0-025 (a level given in 
the Thompson tables) the corresponding errors ὃς would be considerably reduced. 


7. SOME NUMERICAL EXAMPLES, USING LINEAR INTHRPOLATION 


(a) Interpolation for a x? level. Suppose that we calculate the upper 24 % 
level (α = 0-975) for x? with v = 5 degrees of freedom, using tabled values of the 
upper 2 % and 5 % levels. We require the 2 95, 2$ %, and 5 % levels of the stan- 
dardized normal variable as well as the two x? levels. The relevant data are shown 
in the following table; as pointed out above, there is no need to calculate the 
standardized form of either y or x. 








α U, χα χα 
0-98 2-0537 ` 18:388 3-659 
0-975 1-9600 ? ? 


0-95 1-6449 11-070 3:327 





. 


274 Interpolation for probability levels 
By linear interpolation, ie. using method 3 of p. 269, we have 


1-9600 — 1-6449 


Xens = 11-070 + (13:388 — 11-070) κ ων 


= 12-857. 
Interpolating similarly for y, i.e. using method 5 of p. 269, we find 
χροτε = 9:083 or χῆργε = 12-838. 
The correct value taken from Miss Thompson's table (1941) is 
Xon = 12-8325. 


It is seen, as expected on theoretical grounds and as evidenced in the comparisons ^ 
of Table 1, that method 5 is tlie more accurate of the two. ' 

(b) Interpolation for a t level. Suppose that we calculate the value of t corre-, 
sponding to æ = 0-0125 (as defined in equation (9)), with v = 6, using the tabled | 
` levels for α = 0:005 and 0-025. The data required are shown below, the values of. 
t being taken from the table on p. 300 of this issue. l 








Again, it is not necessary to calculate the standardized values of £,, for even 
. when using the relation of method 4, which assumes 


` —. Á+ ΒΌ3, (15) 
; σι Iu 
the transference of σι to the right-hand side of the equation will only modify 
the constants A and B whose values are not directly determined in the inter- 
- polation process. . 
Using method 1 (t a linear function of U), it is found that. 


toores = — 9:023. 
uns method 4( (t/ U a linear function of U?), it is found that 
looi = — 2:979. 


The correot value is — 2-969. As can be seen in Fig. 3, from the stretch of the 
curve for ν = 6 between a = 0-976 and 0-995, the interval chosen is too long for ` 
satisfactory linear interpolation. The use of formula (15) improves matters, but 
is still not satisfactory. With Fisher's tables we could, of course, interpolate - 
between the levels α = 0-010 and 0-025; doing this with method 4, it is found that 


! to-0125 = —2-971 
8 distinotly better value. 


J. B. SIMAIKA 275 


(ο) Interpolation for a beta-distribution percentage level. Take the case νι = 8, 
d = 30 and suppose it is wished to find the 21 % level from a knowledge of the’ 
1% and 5 % levels. The data required are as follows, Όροι and Όρος being taken 
from Miss. Thompson’s tables. _ ; 

















Then, using linear interpolation, i.e. method 3 of p. 273, we have 


1-9600 — 1-6449 
Ίέρρος = 0°62332 + (062332 — 0-54170) x 2.3263 — 1-6449 


= 0-585658. 


The correct value taken from the same tables is 0:58582. It will be seen from 
Fig. 4 that the intervals a = 0-005, 0-010, 0-025, 0-050, 0-100 of Miss Thompson’s 
tables are likely to form a satisfactory framework for,subtabulation if this is 
needed. : ` 


΄ 8. SUBTABULATION OF EXISTING TABLES 


These methods have been used to produce the following ‘enlargements of 
existing tables; but it is not at present possible to arrange for their publication. 

(a) Table of x? percentage levels. Method 7 (p. 269) was used, as it is almost as 
accurate as method 8 and less laborious. The table calculated gives x2 to 3 
decimal places for v = 1 (1) 30, and fora = 0-010 (0-002) 0-020; 0-020 (0-005) 0-050; 
0-05 (0-01) 0-10; and for the corresponding levels at the upper end of the distribu- 
tion, i.e.for a’ = 1—a. 

(b) Table of t percentage levels. Method 8 (p. 271) was used for v = 3,..., 6, and 
method 6 for v = 7 (1) 30; 40, 60 and 120. Exact levels were calculated for v = 1,2. 
The tables were computed to 3 decimal places and for 


α = 0-005 (0-001) 0-010; 0-0100 (0-0025) 0-9250; 0-025 (0-005) 0-050; 0-05 (0-01) 0-10. 


9. INTERPOLATION FOR THE PROBABILITY INTEGRAL a, GIVEN Ug- 


As we have already mentioned, the variable U, can be expressed as & poly- 
nomial in u,. This suggests that interpolation for U,, given u,, is as easy and as 
accurate as the interpolation discussed above. Hence to interpolate for a, given 
, the value of u, and certain tabled levels w,,, Uap ..., we first interpolate for U, 
™ and then find the value of æ from appropriate tables of the normal probability 
integral, e.g. T'ables for Statisticians and Biometricians, Part I, Table IT.- 


276 ‘Interpolation for probability levels 


$ 10. CONOLUSION l 


Tt has been shown how accurate values of the probability levels of a statistical 
variate may be interpolated between standard tabled values by using the stan- 
dardized normal variate as auxiliary. For many purposes linear interpolation is 
adequate; for others a second order Lagrangian formula may be preferred. The 
accuracy of the result depends, of course, on the closeness of the actual prob- 
ability law to the normal Jaw and on the size of the intervals between the tabled 
levels. The method has been illustrated on examples from the χϑ, 1 and v (beta) 
distributions. Ξ 

The method has been used to provide a subtabulation of existing tables of 
percentage points for y? and f£, but it is not possible to have these tables priate 
_ with the present contribution. 


` Finally I doubt ike 3o express my thanks to Dr B. L. Welch of University 
College, London, for originally suggesting the problem to me. 


` REFERENCES 
Conmwigg, E. A. & Fisumm, R. A. (1037). Moments and cumulanta i in the siati of 
distributions. Rev. Inst. Int. Statist. no. 4. 
Fisum, R. A. (1941). Statistical Methods for Research Workers, 8th Edition. Edinburgh: 
Oliver & Boyd. 
PEARSON, K. (1922). Tables of the Incomplete Gamma-Functton. . 
— — (1930). Tables for Statisticians and Biometricians, Part I. 
— —. (1934). Tables of the Complete and Incomplete Beta-Function. 
* THOMPSON, CATHERINE M. (1941). Biometrika, 32, 168-81 


PARTIAL RANK CORRELATION 
By M. G. KENDALL 


1. In interpreting an observed dependence between two qualities we are 
constantly faced with the question whether an association (correlation) of A 
with B is really due to the associations (correlations) of each with a third quality C. 
This has led naturally to the theories of partial association and correlation, 
which attempt to decide the matter by the consideration of subpopulations in 
which the variation of C is eliminated. An analogous problem arises in ranking 
work but, so far as I know, has not previously been considered. For example, 
if a number of men are ranked according to mathematical and musical aptitude 
and there appears à significant rank correlation, it is natural to inquire whether 
this may be attributable to the correlation of both with some more fundamental 
quality such as intelligence. The object of this paper is to propose a coefficient 
of partial rank correlation which has a natural meaning and may be found useful 
for investigations requiring this type of decision. 

2. As a preliminary it may be worth examining hat can be dn in this 
direction with the d rank correlation coefficient p. If there are three 

rankings denoted by I, 2 and 3, we may find the three coefficients pis, P13, Pas- 

It is tempting to apply to these coefficients the formulae of product moment 
partial correlation such as’ "n 

oe P23 — P12 P13 

Paa = T pij (1L gl a 

and to define p,5, 88 the partial rank correlation of 2 and 3 ‘when 1 is constant’. 

There is clearly very little justification for such a procedure, and itris far from easy. 
to explain just what p,,; means. In fact, the only defence of formula (1) that 

can bear critical examination is, I think, that it is an approximation to & second 

possibility, as follows: 

3. There can be no such thing as a rank correlation in a continuous population 
(the members of which are not even denumerable) but we can speak with genuine 
meaning of a grade correlation. A well-known result due to Karl Pearson states 
that in & normal bivariate population with correlation p, the grade correlation 

is given b : ; 
pg BE y py 2sin7 py. . (2) 


The Spearman coefficient p may be regarded as a sample grade correlation. If, 
therefore, we take p as an estimate of p, we may find p, from (2). For three rankings 
.we may then obtain the three values of p,, apply the ordinary product moment 


partial formula, and so obtain a partial coefficient. Since x and 2577 do not 


278 Partial rank correlation 


differ by more than a small amount for |z| « 1, we might even apply formula 
(1) direct to the values of p without bothering to transform them into p, by 
_ equation (2). 
4. Such a procedure, again, is open to fairly obvious objections. Apart from 
_ the all;too-facile assumption of normality and the error involved in using Spear- 
man’s p from a small sample to estimate the grade correlation in a parent, the 
fact remains that we arrive, not at a partial rank coefficient, but at an estimate 
of a partial product moment coefficient in a normal population. 

Perhaps there are cases where this is a reasonable objective based on reason- 
able assumptions but it is evidently unsatisfactory for general ranking purposes. 

5. In a previous paper (1938) I defined an alternative coefficient of rank 
correlation T which may be generalized to include the case when pairs of in- 
dividuals are compared separately (Kendall & Babington Smith, 1940). It will 
be convenient for present purposes to redefine 7 in a slightly different manner 
80 that the results obtained below may again be immediately generalized to the 
case of paired comparisons. Consider the two rankings of six : 


I: 1 4 8 2 6 5 


΄ 


Be Ὁ 2 4 de 6. (3) 
There are (5) = 16 possible pairs of ranks in each ranking, viz. 12, 13, ..., 16, 23, 


24, ..., 56. We write them down as in expression (4) below. Any order of the 
pairs will serve, and it/is immaterial whether any pair is written as ab or ba; but 
for practical convenience they may be written in the natural order indicated in 
the last sentence but one. This arrangement I call the recorded order. 

We then consider the occurrence of each pair in the ranking 1. Ifa pair ocours 
in that ranking in the order in which we have recorded it, we write a plus below 
the recorded order underneath the pair concerned; in the contrary case we write 
a minus. Ranking 1 of expression (3) will then give 
Recorded order: (12) (13) (14) (15) (16) (23) (24) (25) (26) (34) (35) (36) (45) (44) (56) 
Ranking 1: + + ΕΕ. πι - Rok - ook - 

(4) 

Here, for example, the pair (15) occurs in that order in ranking 1 and so is 
denoted by a +, whereas the pair (24) occurs as 42 and is denoted. by a —. 

Consider now ranking 2. The members of ranking 1 which are ranked 1,2 
correspond to members in ranking 2 ranked as 3, 1. This is in the reverse order 
to that of the pair 13 in the recorded order, so (starting a new row of signs corre- 
sponding to ranking 2) we write a minus under the recorded pair (12). Similarly, 
the pair in ranking 2 corresponding to 15 in ranking 1 is 36, This is in the same 
order as the recorded pair, so we write a plus below the existing plus under the. 
recorded pair (15). The pair in ranking 2 corresponding to 23 in ranking 1 is 41. 
This is in the reverse order of the recorded pair, so we write a minus below the 


M. G. KENDALL 279 
recorded pair (23) in the row of signs corresponding to ranking 2. And so on. 


`- This takes rather a long timé to explain but the process is really ` very simple. 


The array corresponding to expressions (3) is then 


Recorded order (12) (18) (14) (15) (16) (28) (24) (35) 89 (34) (35) (36) (45) (46) (56) 
Renking 1 + + + + $F - - + + + 4 + 
Ranking 2: - + = + + - = i a + + + + + + 

i 3 (6) 


Now in expression (5) there are eleven cases in which both rankings have the 
same sign and therefore 16—11 = 4 in which they have the opposite sign. The 


coefficient 7 is then given by 
11-4 7 


Ti — —15 = Ts’ 


Generally, if there are, in two rankings of n arrayed as above, S, cdses of the same 


sign and S, of opposite sign 
-. 2(8ι.-- δὲ) 
n(n—1) 
4S, _ 48, 
an-i) = aaa (8) 


If we arrange ranking 1 in the natural order 1, ..., n, then every case in which 
there are the same signs in expression (5) corresponds to a case in which pairs in 
ranking 2 are in the natural order; and every case of different sign to one in which 
the pairs in ranking 2 are in the reverse of the natural order. The definition of (6) 
thus accords with the one originally given in my 1938 paper. It is often con- 
venient to take one ranking to be the natural order 1, ..., n so that the first row 
of signs in (5) are all positive. For instance, on robrrangement of the rankings in 


(3) we have 
1: 1 2 3 4 5 6 


ο 3 UL 4 2 6 5 (7) 
and the array of paired comparisons becomes { 


Recorded order: (12) (13) (14).(16) (16) (23) (24) (26) eo) (34) (36) 55 (45) (48) (56) 
Ranking 1: Ἐ 5. Π ob od Ἕ + + + 
Ranking 2: > + = + + + + 4 H - : à + + = 
l (8) 


Here again 8, = 11, 8, = 4, Tig = 45. 


6. Consider now three rankings, of which the first may be taken to be the. 


natural order 1, ..., n, for example, 


655 
He 
t2 
Em 
ο 
ce 
e 
5 


280 Partial rank correlation 


The corresponding array of pairs is 
Recorded order: (12) (19) (14) (15) (16) (23) n (25) 126) (34) (35) (36) (45) 


Ranking l: + + + + + + + + + + + + + + 
Bee αν we QUAS Se μμ 
. Ranking8: - — + — + - + + + + Ἑ Ἑ = -+H 
. - l (10) 
For the coefficients 7 we have 

7 

* Tyg (as above) = is 

nm E 

13 lš (015 

ο Rm 

τος | 1B 
li Zhe let casa A bane as usual the number of cases in which paies in ranking 


2 and 3 have the same sign. 
Consider now the fourfold table setting out the occurrences of + and — 
signs in the rows of μα (10) corresponding to peu 2 and 3: 


Ranking 2 


` Q0 














Here, for example, there are six Cases in which pairs of ranks have the same 
(positive) sign; five in which ranking' 2 is negative while ranking 3 is positive, 
and go on. 

Generally, if for three rankings of n the table is of the form 


_ Ranking 2 








(12) 




















M. G. KENDALL . 281 
I define the partial rank correlation of 2 and 3 on 1 as 





Ta = cimi 13 
9317 (a b) (c-- d) (@ +6) (b 4-d)) uo 
x 
x Ni . (14) 
where N = E and x? is the ordinary mean square contingency for the four- 
fold table. J 
In the particular case here considered the coefficient is 
; " 6—15 
Qo μα ii-9.8) ^ 98 
88 compared with $ Tas = — 0:087. 
In the case when ranking 1 is not the natural order 1, ..., n the same prin- 


ciples apply, but in considering the communalities of 2 and 3 we count as con- 
tributing to the a-cell in (12) the pairs which are themselves of the same sign and 
are also of the same sign as the term in ranking 1; and so on. 

7. The partial 7 defined in equation (13) is a coefficient of association in 
the 2 x 2'table suggested by Yule (1912). When the attributes of the table are 
independent and only in this case it is zero. It is + 1 only if b and c are both zero 
(i.e. if the two rankings agree in all pairs and hence are identical) and — 1 only 
if a and d are both zero (i.e. if the two rankings disagree in all pairs and one is 
the reverse of the other). In this latter property of attaining unity only when 
diagonally opposite cells are both empty it differs from two other coefficients 
proposed by Yule.* 

Thus partial 7 as defined is a measure of the association of agreements of the 
rankings 2 and 3 when compared in pairs with ranking 1. From this viewpoint' 
it will be seen that the use of the word ‘partial’ conforms to that in the theories 
of association and correlation. The partial associations of A and B in a third 
population containing C and y are those of AC and BC or of Ay and By. The 
partial association of ranks is that of pairs of agreements of 21 and 31. The partial 
correlation of 2 and 3 when 1 is constant is paralleled by the partial rank correla- 
tion of 2 and 3 when 1 is in the natural order 1, ..., π. If partial 7 is unity in 
absolute value the rankings coincide or one is the reverse of the other, whether 
they agree with ranking 1 or not, so that the coefficient fulfils its proper function 
of measuring the relationship between 2 and 3 independently of the influence of 1. 

* Yule himself arrived at the coefficient by considering product-moments in a 2x2 table. 
Karl Pearson & Heron (1913) mistook Yule’s mtention and thought the coefficient was proposed ' 
as an estimate of the correlation in a normal population whose frequencies were given by & double 
dichotomy in the 2 x 2 table. Their long memoir is mainly devoted to advocating the alternative 
merits of tetrachoric r. The two things, as Yule has emphasized, are quite different. I mention 
the point to make it clear that the Pearson-Heron criticisms of Yule’s coefficient, even if not mis- 


founded, do not affect the above work, since I use the coefficient purely as a measure of association 
in the fourfold table. 


- 


282 Partial rank correlation 
8. We may establish the remarkable result 
T23— T1aTis 16 
e z kaa (1— 75s) (1— 713) m 
In fact, from expression (12) we see that 
| zu (0) - (6d) 
12 77 N ps 
zt (a+c)—(b+d) . 
1377 N , 
... (0d) - (+o) 
23 -- N 5 
Remembering-that N = a+b+c+d we have 


^ 


4 

1-73, = ya (2 ὃ) (c+d), 
4 

l-r = peto (ὁ 4- d), 


1 

Τρ” T1297 13 = aa Uo bro d) {(a+d)—(6+0)} , 

—{(a+b)—(c+d)} {(a+e)—(b+d)}] 
= πε δα — bc). 
Equation (15) follows at once from the definition of partial 7 in equation (13). 

The appearance of the product-moment type of relation between total and 
partial correlations is surprising. There was no reason to expect that partial 7, 
which is a pure-function of disarrangements in rankings and is not expressible 
algebraically in terms of the ranks, should bear any analogy with the partial 
correlation of variates; but since it does so, we are evidently fortified in regarding 
` partial r as a convenient measure of rank correlation. ; 

Example. Ten men are ranked according to (1) intelligence, (2) mathematical ability, 
(3) musical ability. The rankings are: A 

l: l 2 3 4 5 6 7 8 9 10 


2: 1 4 .6 6 "2 7 3 9 8 10 
3: 4 1 3 δ 2 6 7 10 9 . 8 


It wil be found that 7,, = 0.044, τις 0-644, Ta = 0-555. Thus mathematical and 
musical ability are pogitively correlated. The question is, can this correlation be attributed 
to the correlation of both with intelligence? 


i 0-555 — (0:644)3 
We find ς T= T= (0-644)? 
: = 0-24, 


The conclusion would be that although part of the total correlation 1s due to the correlation 
of both with intelligence, part of it 18 not. We cannot attribute the whole of the observed 
(total) correlation between mathematical and musical ability to the interference of common 
correlation with intelligence. 


M. G. KENDALL l 283 


9. The same methods are immediately capable of extension to paired com- 
parisons. In fact the atray of type (10) is the array of paired comparisons for all 
the possible pairs of ranks and when there are no constraints of the ranking 
character (i.e. such that if 4 — B and B C then must A — C) the coefficient T 
becomes a measure of agreement in paired comparisons (cf. Kendall & Babington 
Smith, 1940). We could then construct measures of partial agreement by the 
same formulae in cases where it was suspected that there were mutual influences 
ab work between three observers, as for instance if 16 were suspected that com- 
munity of preference between two children was due to community of both with 
one of the parents. 

10. In conclusion, it may perhaps be desirable to point out that although 
partial 7 is defined by equation (14) in terms of x? for a fourfold table, its signi- 
ficance cannot on that account be tested in the Type III x? distribution with one 
degree of freedom. T have not yet succeeded in finding expressions for the sampling 
distribution of partial r but it seems clear that the Type III distribution will not 
be reproduced in the ranking case, at least without some substantial modification, 
even when the rankings are independent; for there exist correlations between the 
signs given by any ranking in the recorded order. If, for example, (12) and (23) 
are positive so must be (13), whereas if (13) and (23) axe positive (12) may be 
either positive or negative. The units in the fourfold table cannot therefore be 
regarded as allocated at random and the type III distribution will probably not 
hold. I hope to return to this subject on a later occasion. : 


REFERENCES 


KENDALL, M. G. (1938). A new measure of rank correlation. Biometrika, 30; 81. 

KENDALL, M. G. & ΒΑΒΙΝΩΤΟΝ Sars, B. (1940). On the method of paired comparisons. 
Biometrika, 31, 324. ` 

Prarson, Kant & Heron, D. (1913). On theories of association. Biometrika, 9, 159. 

Ύττα, G. Upnxx (1912). On the methods of measuring the association between two attri- 
butes. J.R. Statist. Soc. 75, 579. 


Biometrika xxxn 19 


INEQUALITIES FOR MULTIVARIATE FREQUENCY 
DISTRIBUTIONS 


By C. E. V. LESER 


GIVEN a frequency distribution y of a single variate x, with arithmetic mean 
Z = 0 and with standard deviation σ, Tchebycheff’s (1867) well-known inequality 
presents a lower limit for the ratio of the frequency of all values of z between 
— Àc and λσ to the total frequency, where Az 1. In the case of the special class 
of frequency distributions for which γ(α) -- y( —2x) is a non-increasing function of 
| «|, this inequality can be substantially improved to another limit which applies 
to all positive values of A, as already Gauss (1823) has proved. Various authors* 
have generalized these theorems by modifying the assumptions made with 
regard to the frequency distribution, by introducing moments of higher order 
than the second or by extending some of the results to bivariate functions. In 
the present analysis only moments of second order are used, but the results apply 
to frequency functions of any number of variates. 

Suppose y(2,,...,2,) to be a frequency distribution: of n variates, with total 
frequency equal to unity, arithmetic mean at the origin and with standard 
deviations σι,...,σμ. Let P be the frequency of all combinations of 2, ...,2,. 


for which : E v \2 
(x2. fm Sn. 

‘ n n i 

Wate ho FD τα o= nsi 
so that AZ and o? are the harmonic averages of A2, ..., A$ and o, ..., 03 respectively. 
We also write a, 
μις τε, tat (9 = 1, ...,n), 
Ay σο j 


ει ο cs) ++] 
" MG) τν αντ = Avo λισι ai As, 7 
so that P is the frequency of all values of z, ..., 2, for which R/,JnSAgo,. Further- 


more, we define A(R,) as the average value of y for all those values of.2,, ..., £p 
for which # has a fixed value R,, and therefore 


faar ae 


| sam ων 
RoR, 


* E.g. K. Pearson (1919), B. H. Camp (1922), S. Narumi (1923), C. D. Smith (1930), P. O. Berge ` 
(1987). f 





A(R) = 


κ. D 


C. E. V. LESER. - ` 985 
We can now state the following: ` - 


ΤΗΕΟΒΕΜ,. Assuming the frequency distribution to be such that A(R) 18 a 


non-increasing function of E for R/,/nSxo, we have one of'the following three 
sets of inequalities, according to the value of «: 


IL: κει: 
à) AS1 Ρ50, : 
1 
(ii) 1<A, | P21-5. 
i 
II lsk «(55 
n 
: 2 yn n+2 1} (Ag\" 
o sii) © P ασ) 
2 yn a 
«λος 21-5, 
a) (3) KSÀ SK Pe K? 
τα 
(iii) KSA, ΡΕ πε 
i 
ΤΠ. (= EK 
n 
. l 9 Un /n+2\t n 7 a 
ὦ λα) CI) 7 ο A 
2 \'m in 4+2\t_ . ο 2 Ve " ar 
sof 2 n+2 2 sia a, 
ὦ) (3) ( n sx 5( 5) ; P2l-\iy3) X 
dés 2. Xm Y ` 
- (iii) (23). KSA SK Pzi gd 
l : b 
(iv) KSA, : Paloi 
Proof. We introduce the new system of co-ordinates 
‘ g, = Jt, T, E cost, cost, costs... COS o, η, 
Za = Ja Tg Rein ty cost, costs... COS Us 1) 
αρ = ματα R gin ta costs... COS bni 
Tn = s TS È * sin f, 3, 
go that Y(t ..., En) ἆσι ... ἆσῃ = 2(R, ty, ..., 0, 4) Rd Rt, ... di, 4, 
where 2(R, ty, ..., ἴῃ. Ὁ) = Y(R, tis -+s tga) COE ty 00891 ... COST? Lu. 


19-2 


T * 
δή ΄ 


. 286 Inequalities for multivariate frequency distributions 
We also write ; | 8 = σον, 
and we are going to use the following abbreviating symbol NN 


E ἃ. fir in 7 
ΝΕ το. 
T πο αμ ᾿ 
Then - . i f. R-12(R, 1) d Rit = 1, 
E > R-0 
n ]- Re-12(R, 1) dRdt = P, 
R=0 


and as A(R) = const. [E 2(R, t) dt, we have the condition that | 2(R, ὁ) dt is a non- 
KE function of R for R S xs. Furthermore, let us write. i 


-6- [ios na 2 


and later on . Er Dar. 


Now start from the equation 
2 7 R | 
PE 
τι 
s EN (a) + +(e πο dag 
ES TA PS τη]. : 


which can be written 





(Ago) s Ea ta): i m [Bree aR ᾽ 


or - + " k Ξ Id, 
E zn " mxw(RudRd, L= in jt R-+1( R, t) idt. 
Έλις 
According to the value of κα, three cases have to be distinguished. 


(a) az | e σα or : Pz1— (x«^—A$)«. 


_ For EX As | 2(R, t) ἄ z: G. 
- T 
Aas 
Hence ' 156 ΠΠΩΕ, 
"rur p. -ᾱ Š 
— Ae ; 
and incidentally PzG|" m-igB—Ap. ` 


220 


4 


C. E. V. LESER t ; 287 
' To óbtain a limit for 1», we make use of ve equation 


- (0 [aatia pj” 
Í [ Άν ΙΕ, )ἆβάι--1- Ρ-- of Ένα. : 
R=A,3J T ; R=A,8 


> Í ^L. RR, 0) ἀΒάι. 
τ ως. P)| 


eects , 
- Í rel g- | «(R, t) σε] dR. 
R-A6 ; T . 


As the integrands'are nowhere negative and E nowhere smaller in the integral 
on the left than in the integral on the right, we can write 


E PE R2(R, 0d Rdt | 
-[α91δα-- p" 


[azam] 
ReAs 


Qusa-Sa- p^ 
μπα G »] 


Ræà,8 | 


ra| a- IK: (R, nat jan, 


R^G, 


and therefore 


[0.974 2a-5]^ [αιλ sy n(1— βαν 
2 oo +1 ἘΞ 0 
szaf E EU ir d 

nhn + 2) ur z [πλω 4-1 — Βλ] εδ, 
nin42) ` 
P21+Apu— Mul unD, 
n Un 
O asses αρα e Psi-(e—3w Ai 
As before, . LzG[(" medR and PzAgu. ; 


1-0 


Furthermore, we have the equation 


5 G 
[ Í Rz(R,t)dRdt=1-P =Q Lo RRi- P= (x— Ape], 
R-eM8sjT ' R=A.3 


is ο HB dat = i rel af «(dodi an 
Rext R-A8 T 


ο... 


288° Inequalities for multivariate frequency distributions 
and, 88 before, it follows that 
m [e AR )4Rdiz [' "Rel a- [μι «(R, ons 
R-xs R-AÀ, - 
mE £e i-r Semel]. 
Lz e. Rd RE + (xs)? |: -Ρ αν — At) | ; 
R=À,8 


Hence ` aza ‘RHR + (xs)? È -Ρ : (x? — Ag) s] ; 
: R=0 - 





12 
n+ 


P2 dea ο. 
(ο) KESAS Or pa 

In this case, J, need not be larger than 0, but i 
L2 (48) (1 — P), ᾿ 
825 (λρϑ)3 (1 — P). 


kn tte +x°[1 — P (κ --λδ) ul, 


η 





΄ 


Therefore, ifA,21, P2 I-A and if Ay € 1, no significant limit exists. 


For A,2x, the problem has thus been solved, but the case Aj E is more 
complicated; summarizing the results of (a) and (b), we have two alternative sets - 
of inequalities., Write _ 


, Ae) = = Agu, falu) -1-(é- -AQ)u : 
Jalu) = 1-- Apu CP. um, 
: - 00 πώς μη «(a 


je)» 

Then either 7 ΡΞ], ΡΞ), de 
ος ᾿ * od Ρο], PSfy Pf. i 

For different frequency distributions, G and therefore u may assume any 
non-negative value which is compatible with the condition PS1. We have to 
find the effective lower limit for P as a function of u and its minimum value 
which, as easily seen, is only x than 0, if f,(0) 20 or, what is the same thing, 
k»1. i 

Ja fe Ja are straight lines. ie T : 
a Hy 2A- 2 (n. ye 

n\n 





quom» . 


- 


eo C. E. V. 18858 289 


- 


f, has a minimum when 


: peo nt 9 mam 1^ 2( 2 Mm r 
-= ~ n \n+2 APPS n\n +2) Ap? 


΄ 9 231 n42/ 2°\% 1 2 Yn] 
and (mn). Τμ [ο ο. 2 Ut *j_ 47) 2.1L VES 
fi ars) A n (53) Ai E (5) Ab 
Indicating by u,;, f; the co-ordinates of the point in which the curves fhi and f, 
intersect, we find that 








RE s δὲ = (s), 
ο. «ει τν 


n+2 1\1 n+2 1} /Ay\* 
en aa fe -aG 


fas fg and fa have a point (tosa, Jaga) in common in which f, is the tangent to fs: 


4 n+2 1 nd 2kx^—AD 


Maga = καπ’ fm —1—— pna C 


It is also seen that the sign of both expressions ts and 4,4,— ugn) jg positive 


" 3 2 Mis 
or negative according to whether A, 2 (3) K. 
i . 


We have therefore four possibilities. In the following diagrams, the heavy 








290, Inequalities for multivariate frequency distributions 


line shows the effective lower limit for P as a function of u (the upper limit is 
always equal fo 1, of course), and its minimum gives the general lower limit for P. 
In these four cases, the following results are obtained: 


(1)) : Uzga 2 Mag sis Syn 
(2) df, ; m2 Uys 2f 
(3) du Sj "mai e ipod su, Pafen 
df, l 
4 P2f,(0 
(4) d xo 2 f4(0) 
and by substituting the proper values: ' 
n+2 Cop 9 y (o πμ 1YVÀ, 
(1) I&« {7 - | Gn dos (5) κ ος 
. "9 ijn ην 4-92 n in " 
2 S 5. sss) Ca)? (za) » 
. kZzl —— - 
σα 9 y 2 ^ (n-9 5 2 ^1 
^ (s) «A2 (L5) ( n ) Pals Το) AP 
2 y d 
(4) κε], (53) KSÀ, Sk . ῬΡΞ1---- 


In addition, we know that for λρ5κ, λο:1:ΡΞ] ~ if By rearranging these 
inequalities, we can bring them into the form in which they were given in the 


theorem, which is therefore proved. 

It may be remarked that κ depends on the ratios between any two of the 
quantities λῃ,.. Ans but it remains the same, if A,,...,A, change in the same 
proportion. $ 

Let us consider the sets of inequalities I, II and ΤΠ of the theorem separately. 
The most important case in which set I is relevant occurs when nothing about the 
frequency distribution is known except the averages and standard deviations of the 
variables, in which case wehave to put = 0. It providesa generalization of Tcheby- 
cheff's inequality which is obtained in the special case n = 1 in which A, = A. 


If n = 2, the inequality refers to the frequency of all points lying inside the 


ellipse which has the axes A, e σι, λα 4/203, and it can be written in the following 
Way: à 5 1 = 

Iti is interesting to compare this limit with de. one given by Berge (1937) for & 
rectangle: which has its corners in the points with the co-ordinates ( + Ao, +Aq,): 


1+4(1—1) 
Ας’ 


ρει. 


I 


where r is the’ correlation coefficient of the frequency distribution. If r = 0, 
Berge’s limit equals'1—2/A® and is therefore equal to the limit we get for the 


C. E. V. LESER. f 291 


ellipse with the axes λσι, Av, which is inscribed to the rectangle. On the other 
hand, ifr = + 1, Berge's limit becomes equal to 1—1/A?, and we have to choose 
the circumscribed ellipse with the axes A./20,, λ/2σε to obtain the same limit. 
Hence, the limit given here is certainly not inferior to the limit given by Berge, 
if the two variates are independent, but it is not superior to this limit, ifthere i is 
a perfect correlation. 

Set II is perhaps of more theoretical than practical interest. An intermediate 
value of x may, however, be realized, if there is sufficient information about the 
frequencies inside, but not outside, a certain n-dimensional interval. The in- 
equalities correspond to those given by Narumi (1923) for frequency distributions 
of one variate only, but having a more general meaning in so far as P may refer 
to multiples of other quantities besides the standard deviation. The first inequality 
, Seems the most interesting one of this set; it reduces to a special case of Narumi’s 
inequality if we put n = 1: 


-- 
Ashe: Pz (i)i 4 
and for n = 2 it may be written 5 


RI 


Ed ly 1 1 

ὦ ata Ρε Μι ο) are 

Set III generalizes Gauss's inequalities. It reduces to the first two inequalities 
for those values of A,,...,A, (if any) for which x = oo. This is the case for all 
values of A,,..., A4, if y(hay,,...,hv,)+y(—hay,,..., — hvz,) is a non-increasing 
function of |A| for any fixed values of z,,...,z,; especially if the function y 
decreases monotonically along every straight line radiating from the origin. 

The assumption x — oo will be made throughout the following analysis. 
Gauss's formulae are obtained by substituting n = 1: 


λ 
zB. PS 
ab 4 


In the case n = 2, the inequalities are also greatly simplified: " 


Ars pz. 

2 
. ABIDPREl-g 
i ι 1 1 


να A "xU Tue 7 : 


` x tg P21- iata) 


292 Inequalities for multivariate frequency distributions 


Returning to the case of an unspecified value of n, we shall often be interested 
in a limit for the frequency of all values of the standardized or original variables — . 
for which the distance from the origin is no more than a certain value, i.e. either 


A zi a? DE 
b. S ut, 
. ges tg 
ος. 5 e+... Γαλ Sp? 5 ` 


In the first problem, we have 
Aa sada 
and our inequalities can therefore be written 


ο. ου. 
JAS 23^ + 2γον- προ, 
μὲ 2; h(n + 2)m-2m P2 1— (A 5 

= z n+2 
In the second problem 
E gud. eee esr 
᾽ν. 
Hence, if 
7 34... σα np? — in 
πως δω. E Ed 
PR? VO n Pel Gites. toa) ' 
ot+...+0% 2 163 tol 
2 2n (qj (n-2/n 1 --- ! ^m. “ἘΞ 
Ρ 22 (n+ 2) n Pal Rm p? i 
Again, the insertion of special values for n will simplify the formulae. 

Finally, it may be at least of theoretical interest to consider the generalized 
Gauss limit, as obtained for x = oo, as a function of n for fixed values of Ay, and 
to compare it with the generalized Tchebycheff limit which is obtained for x $1 
and is independent of n. For some particular values of A, and n, this is done i in 
the following table: : 















- B 3 
Gauss limit . 
0-889 0-951 
0-875 0-944 
0-864 0-940 
0-856 0-936 
M 
Tohebyoheft limit 
0-750 


C. E. V. LESER 293 


M ic 
Tasthermioss. since for foras s 2 ) "en ; . 
+2 n 
n M 2 
lim AY n< lim -— = 
- tim (7) ^i — no T 2 0, 
5 * 2n i . 
and since lim} 1— LN ET Ep. 
p n—- o0 n4 2] 3 Ai 


it is seen that with an increasing number of variates, the Gauss limit loses gradu- 
ally its superiority over the nuls limit and the difference between the . 
two limits tends to vanish. 


* 


REFERENCES 


Berar, P. O. (1987). A note on a form of Tchebycheff’s theorem for two variables. Bio- 
metrika, 29, 405-6. . 

Came, B. H. (1922). A now generalization of Tchebycheff’s statistical inequality. Bull. 
Amer. Math. Soc. 28, 427-32. 

Gauss (1823). Theoria combinationis observationum erroribus minimis obnoviae, Gottingen, 
pp. 9-12. . f 

Narooma, 8. (1923). On further inequalities with possible applications to problems in the ΄ 
theory of probability. Biometrika, 15, 245-53. 

Parson, K. (1919). On generalized Tchebycheff theorems in the mathematical theory 

^ of statistics. Biometrika, 12, 284-98. 

ΒΜΙΤΗ, C. D. (1930). On generalized Tchebycheff TM in mathematical statistics. 
Amer. J. Math. 52, 109-26. 

TOBEBYOHEFF (1867). Des valeurs moyennes. J. Math. pures appl. ser. 2, 12, 171-84. 


THE MODE AND MEDIAN OF A NEARLY NORMAL 
, DISTRIBUTION WITH GIVEN CUMULANTS 


By J. B. S. HALDANE, F.R.S. 


IN the early years of biometry the mode and median of a distribution, and 
especially the latter, were regarded as being almost as important as the mean. 
Later Pearson and others developed the method of moments, and recently the 
cumulants, which are readily derived from the moments, have been widely used. 
Pearson (1895 and after) discussed the relation of the mean, mode and median 
in some of the skew frequency curves which he invented. He found empirically 
that for skew curves of Type III, namely, 


p 
y= v1 +3) εσας 


when 7 is positive, Mode — Mean = 3 (Median — Mean) approximately. By fitting 
for a series of integral values of p he found | 


Median — Mode 


————————— -- 0. . —1 
Mean Mode ~ 7909140009457. 


Some later writers have taken the trisection as a general law. But as we shall 
see, it does not always hold, even approximately. So far as I know, general 
expressions for the mode and median in terms of the cumulants bave not been 
given, nor have the conditions been stated under which Pearson's rule holds 
approximately. 

Consider a variate X, with distribution df = F(X) κ. and cumulants 
Ky = M, κ. = OÈ, Kg eos Kpy eee 

The algebra is simplified if we make the transformation z = (X—m)/c, so 
that z has mean zero, and unit standard deviation, its cumulants being 0,1, γι, 
Yas -s Yr +++» Where y, $ = K K3”. Thus y, is the rth measure of the deviation from 
normality of the distribution of X. Now y, may become infinite for all values 
of r, or for all even values of r, above a certain value. It may diminish indefinitely 
or remain small. In other cases it falls with r at first; but then increases without 
limit. Thus for Pearson's Type III distribution, whose equation has been given 
above, 


-e-9i o£. and y,-(r-1)!(p-r 1). 


And in the case of any estimate of a statistic, such as the mean, variance, 
standard deviation, or skewness, based on a sample of n members, y, = O(n-ir). 
We shall not discuss the convergence of the series obtained later. But it is worth 


ἽΝ J. B. S. HALDANE 295 
noting that expansions in Hermitian polynomials are often satisfactory asymp- 


totic expansions, even if they diverge. . 
Let df = f(x) dz be the distribution of x. Then it is known that we may write, 
symbolically,* ‘ 





ο πο e 


Expanding in Hermitian enn A(x) = dec ε-ἰσ", we have 


—iz? 
fe - sb Bo) + Bo) - 5 Ha) 


ortin Hæj- 3571 γι + Ys gr (a) +.. | . 


+ ΤΙ 


In the special case where y, = Ο( 1) we have 


get 
,ία) 


7 Js) ο Άπω 


1 n á Hy) + oa | ; 


To find the mode, we put f(z) = 0. Hence d 


1-89 =, 








He) - s Bre Hyg) ... = 0. 


The terms needed to find the root of this equation with the smallest absolute 
value depend, of course, on the behaviour of the cumulants as r increases. If 
| ]γε]» [γα], etc., are not sabstantially less than | y, |, but all are small, so that 
powers and products can be neglected, then: 

Ys 


zac Lil. 


. 2` 8 48 576 
Where y, = O(n"), we have 








Hence t= nB ae O(n). 
So in the first case the mode is . 
mto -24% 2 CP Yea ΠΩ 
E ae mat eT hea (1) 


* For a comparison of this expansion with that of the Charlier Type A distribution, see Cramér 
(1937). The symbolic relation has also been used recently by Cornish & Fisher (1987). 


298. The mode and median of a distribution 
ae = O(n), then the mode i is 


m-— ie(n.- -24 ὄγιγε -2) 400-9 





6 


“Kg , Kg ÖKK, KÈ 








ΝΕ n 

e> = Ky Ed id a4 OM) . 
, q Hs ὄμομε FCR =) 

LJ — e + gui 128 * B κοιν ere 


The median is the valie of x for which Í 7 . f(u) du = 4, or,.since 


1 ΠΝ ΚΝ, 
ση] οταν 
the value for which ᾽ 


z — iat 
al, μαι T 0. 


So å >o o at [ età du 7 By) 2: 2 Hz) 18 Hua)... 


130 
Tf y, does not fall off systematically we TS 


lig Y. Ys ΝΗ En (=) Yara + 


6 40 336 Frrr O] τ 


When κ, = O(n), we find 


: 1 
se 423. Aeh 7 δγιγε 3574 | Oln) = 0. 


6 40 48. 432 


i ο ο Ms ys, 7 
Hence . g= 6 40 ig taaa τοί =; 


So if powers and products can be neglected the medianis ^" 


6 40 336 ; 
Pa , Ks Κα. (—Y Kari 
, C76 40k) 336 Bre Ότ]κὶ 
‘And when K, = O(n), the median is 


Yı yy. ΙΤγὶ < 
m— v(s- B nh πη 2) 


Kg , Kg Kaką 17x38 





=K gp 40b 12x93 B24xd 


Hs, Με - Κ.Δ. 17 μῇ 
6u, 4012 1243 824μ4 





= rm — O(n^?). 


ten — O(n-*)... 


(3) 


(4) 


J. B. S. HALDANE : | 297 


“tha the distance from the mean to the mode is approximately thrice that 
from the mean to the median if Ya Ya ete., are small quantities (i.e. 4, is nearly 3, 
etc.) and if Ys, γε, etc., are small compared with γι. This is often the case for 
nearly normal distributions: Thus for the best known of Type ITI distributions, 
that of χΣ for n degrees of freedom, Ky =, Kg = 2n, K, = 273 (r—1)!n. Hence 
from equation (2) the mode is at n— 2+ O(n-?). In fact it is n—2 exactly. For 


the median, equation (4) gives n — ; + ae O(n). m are which Pearson 
empirically estimated at 0:6691 + 0-0094p— is therefore = TI + O(p-*). 


3 i 
Consider the distribution of the estimated mean, n-z, from a sample of n, 
taken from a distribution with arbitrary cumulants κι, Ka Ka, etc. The cumulants : 
of the distribution of the mean are Κι, K,/n, ks/n*, etc. Hence equations (2) and 
(4) hold, and the mean is Κι, the mode is 





Oo K κι BKK; κὲ n- E 
KT Oak, (ss- 12d t" YOU 
and the median is 

KaKa | 17K3 


K3 Kp — Kg 


-2 -3 
nk, (T04 12x3 28). POUR 


κι-- 
The corresponding expressions in terms of the moments may be written down. 
The skewness is not exactly 1/n of the skewness of the original curve, but nearly 
so if n is large. 
Again consider the distribution of the estimated variance from a sample of 
n members from a normal distribution with unknown mean, and variance σϑ. 


The cumulants of the distribution of the estimated variance L [Ex —n-1(Le,)?] 


are 
2 8σϑ 4808 384010 


n-r “Τατ Ἄτην πη 


- Thus we are dealing with a slightly transformed x? distribution, and the mode is 
2g? κ (n —3)o? 
nel " mcd 


ἼΤΩ, maire user A AT ας 008 ρλ 
aL επ η 408(n— Tet OM jj e |: 5» aora t OO |. 
ΤΕ however, the distribution sampled is not normal, but has & finite Kz, etc., 
then the first three cumulants, in terms of those of the original distribution, are 


2k? κι 8x3 4(n—2)k$ ον 12KaK4 , Κα᾿ 
“a n-i n (n—1$ (n-1? nmal) ni ^ ', 


Ki 03, κ ete 





g? , exactly. From equation (4) the median is 














Fisher (1928) gives expressions for the next three cumulants, but these become 
very complex, κ(29) having 21 terms. a 


E 


298 The mode and median of a distribution 
Tt follows that the mean is Ks, the mode 


. : [25 Ακ + 8koK4 + Kg 


-14 O(n-$), 
DT YR LLL: 


` and the median ; 
i 2Ka | 4K + 8KoKa tKa | 1 Oin- 
5 «-[5 Emea | ον, 


The distribution of the estimate of x, is symmetrical for a μι dis- 
tribution. In general its mean is Κα, its mode 








— 558K + 216 c, + 108K K5 — ks Ko t 2Tk ky ΓΚ]. l 
eda IA Esp. ais -14 O(n-2 
ΠΕ; Kat 2(6x3 + θκξ + 9k Kk, + Ke) ἘΠΕ 


the median differing from the mean by 4 of the value given ε Shove i 
Finally the mean estimate of x, for a sample from a normal distribution is 


zerb, its mode n +O(n-*), and its median em. O(n-?). The TEE 


expressions for the Lots and median can ready be πό from Fisher’s 
(1928) equations. - 

We-now pass to some cases where Pearson’s trisection rule does not hold, - , 
even approximately. 

Pearson’s Type I and Type IV curves are asymmetrical, and have one more . 
adjustable parameter than Type III. Thus y,, i.e. fa— 3, can vary independently 
of γι, i.e. ffi. In consequence the curves niay be nearly symmetrical, but far from 
normal. This is so if they approximate to Type II or Type VII, respectively. In 
this case the even measures of divergence y,, may be much larger than the odd 
measures yg, Which tend to zero with symmetry. Thus we cannot neglect 
higher cumulants, or products, as in equations 1—4. For Type IV and VII curves 
the higher moments are infinite; so no formulae of the given type are possible. 
For Types I and II a formula could be given, but direct calculation is clearly 

' preferable. 
A simpler case is that of the scalene triangular distribution whose graph i is | 


obtained by j joining (--ὂ, 0), [ο Ἡ 
- 2(b+2) Pee . 2(a—2) 
δία for bzzz0, and. = Gat) 

Here the mean is $(a — b), the mode O, and the median, if a > b, is 

: ` a—b (a—5)? (a—by 
- a, — J[$a(a +b), or bru T 32a Tt 128a t 


That is to say, the median is one-quarter, not one-third, of the distance from the 
mean to the mode, if a —b is small λε, with a. The rth moment about zero 
2fartt— (-- b)+] 
r(r+1)(@+6) ` 


2 
E and (a, 0). That is to say 





“for - θέα-α. 





J. B. S. HALDANE |. 999 


_a-b _ A[2k(9 + 918)] 6 /[2k(9 + 2h?)] 
If k= Jab) then y, = BBIE 72 —-—bYs--— D , eto. 
If k is small, the odd y’s are of order k, whilst the even ones approximate to those 
of the symmetrical distribution. Moreover y, = —32y,, so the formulae 1-4 - 
clearly do not hold. | " 
We see then that formulae (2) and (4) are quite useful in a special case which is 
important in sampling theory, but have no general validity: 





SUMMARY 


Expressions are obtained by which the distances of the mean and median of 
a skew distribution can be calculated from its moments or cumulants. The series 
obtained may or may not converge. They give satisfactory results for Type III 
distributions, and for the distributions of the mean, variance, and other cumu- 
lants as estimated from samples. 


i REFERENCES 


Cornisu, E. A. & FisuER, R. A. (1037). Moments and cumulants in the specification of 
^ distributions. Rev. Inst. Int. Statist. Pt. 4. . 

CnRAMÉn, H. (1937). Random Variables and Probability Distributions, Cambridge Mathe- 
matical Tracts, 86—88. 

Έταππα, R. A. (1928). Moments and product moments of sampling distributions. Proc. 
Lond. Math. Soc. Ser. 2, 30, 199. ' 

Parson, K. (1895). Contributions to the mathematical theory of evolution. II. Skew 

variation in homogeneous material. Phtlos. Trans. 186, 343. 


Biometrika xxxi ἵ È 20 


TABLE OF PERCENTAGE POINTS OF THE 
t-DISTRIBUTION 


Computep BY MAXINE MERRINGTON 
Tux following table has been derived from Miss Thompson’s Tables of Percentage Points of the 
Incomplete Beta Function (Biometrika, 32, 168-81), by taking 
t= 4f{v,(1 —2)/,2} 
for the case νι = 1. ¢ is the usual Student" ratio based on an estimate of variance having v= v 
degrees of freedom. If tp is the quantity tabled, P/100 is the chance that | ¢| > tp, i.e. represents 
the area m the two tails of the ¢-distribution. The table moludes certain levels for t not previously 
available and should be accurate to the five significant figures shown. 
E REUS i 


ΙΙ 


ν 50 26 











100000 2-4142 
0-81650 1-6036 
0-76489 1:4226 
074070 1-3444 


0-72669 1:3009 
0:71756 1:2733 
071114 1-2543 
0-70039 1-2403 
0-70272 1:2297 





OONN P. οὐ om 


10 0-869981 1:2213 
11 0-69745 1:2145 
12 0-69548 1-2089 
13 0-69384 1:2041 
14 0-69242 1-2001 


15 0-609120 1-1987 
18 0-869013 11987 
17 068910 11910 
18 0-68837 1-1887 - 
19 0-68763 1-1866 








20 0-68696 1-1848 
21 0-68635 11851 
22 ἡ 0-68580 1.1816 
23 0-608531 1-1802 
24 0-68485 1-1789 


25 0:68443 1-1777 
26 0-068405 11766 
27 0-68370 1-1757 
28 0-68335 11748 
29 0-68304 1.1739 


30 0-808276 11781 
40 0-880606 1-1673 
60 0-67862 1-1616 
120 0-67656 1-1559 
ορ | η 1:1603 
































" 


THE PROBABILITY INTEGRAL OF THE RANGE IN SAMPLES 
OF » OBSERVATIONS FROM A NORMAL POPULATION 


1. FOREWORD AND TABLES 





Bv E. S. PEARSON ` . 
1. Scope of the main table 
Denote by 2, 2, ..., x, 8 random sample of n observations, arranged in ascending order of 
magnitude, drawn from a normal or Gaussian population having for probability law 
(5) = pee! (a) 
Pia) = emo P g* 


where 4 and v are respectively the mean and standard deviation of the population. The 
range, sometimes described as the spread, in the sample 18 x, — x, and we shall write the ratio 
of the range to the population standard deviation as 
ρω iR Ra (2) 
σ 
No simple expression exists for the probability law f,(w) of w, but Table 1 below gives 
for specific values W of w computed values of the probability integral 


ΤΡ 
ΡΤ = |. falto) eo (3) 


D 


This expression is the chance that the range in a sample of n observations is less than a 
given multiple of the population standard deviation. The table has been calculated for 
samples with n lying between 2 and 20 and for intervals of 0-05 of W. The values of the 
integral are given to 4-decimal place accuracy; linear interpolation is adequate except in 
the neighbourhood of the two quartiles in each column. The method of calculation is 
described below by Dr H. O. Hartley in a separate section. ; 


᾿ 2. Auxiliary table for special uses (Table 2) 

When dealing with samples contaming only a small number of observations the range or 
spread may often be usefully employed as a measure of dispersion m place either of the 
standard (root mean square) deviation or the mean deviation. For example, an eatimate of 
the population standard deviation may be obtained by multiplying the range in a single 
sample or the mean range in a number of samples by the factors a, shown ın Table 2. The 
accuracy of this form of estimation of σ compared with that of other methods has been 
discussed by Davies & Pearson (1934). 

In other circumstances it may be useful to plot in serial order on a control chart the 
values of range obtained from successive samples, e.g. when dealing with the control of 
quality in mass production. For this purpose it is necessary to know certain standard prob- 
ability levels for w which will serve as contro] limits*. Twelve of these levels expressed as 
percentage points and obtamed by interpolation in the main Table 1, are shown in Table 2. 
They replace approximate limits published a few years ago (Pearson, 1932). It will be found, 
however, that except m the case n = 3T, the discrepancies between the two tables are all 
small. As an example, the table shows that 1f samples of 7 observations are randomly drawn 
from a population with a standard deviation σ, then ın the long run only 5 % of those should 
have a range. greater than 4-17, while 95 % should satisfy the inequality 


1-260 <£, — t < 4-490 


Probability levels for samples with n> 12 have not been included as the use of range for 
control purposes in larger samples 1s of doubtful value. 


* For & discussion of the use of range in problems of industrial quality control see ο. 
issued by the British Standards Institution (Pearson, 1935, pp. 89-90; Dudding & Jennett, 1942). 
T Correct values for the cage n=3 were given by McKay ὅς Pearson (1933). 


~ 20-2 


vt 





Table 1. ` Probability integral of the range W in normal samples of size n 









































^ 5 | 
| 2 3 ἃ 7 8 9 ] 40 
Ww . 
0-00 | 0-0000 | 0-0000 
0-05 | -0282 | -0007 | 0-0000 
0318 | -0564 | -0028 | -0001 
0-315 | -0845 | -0062{ -0004 
0.20 | «1126 | -0110| -0010 
0-25 | 0-1403 | 0-0171 | 0-0020 
0-30 | -1680 | -0245 | «0084 
0-35 | +1955 | -0332 | «0068 
0-40 | -2227 | «0451 | -0079 0-0000 
0-45 | -2497 | -0543 | -O111 0001 
0-50" | 0-2763 | 0-0666 | 0-0152 0:0002 | 0-0000 
0-55 | "3027 | «0800 | -0200 ! -0003 | «0001 
0-60 | «9266 | -0944 | +0257 0004 | -0001 | 0-0000 
0-65 | -3542 | -1099 | .0323 0007 | -00024 0001 
0-70 | 51894! «1268 | "0398 0011 :0008 | «0001 | 0-0000 
0:75 | 0:4041 | 0:1430-| 0:0483 0-0016 | 0-0005 | 0-0002 | 0-0001 
0-80 | :4284 | -1616 | «0678 -0023 | -0008 | «0005 | -0001 
0.85 | -4022| 1805 1. 0682 :0032 | -0011 | -0004| «0001 
0-90 | -4755 | -2000| -0797 :0044 | «0016 | «00061 -0002 
0-95 | -4983| -2201 | «0022 -0059 | «0028 | -0009 | «0008 
1-00 | 0:5205 | 0-2407 | 0-1057 0-0078 | 0-0032 | 0-0013 | 0-0005 
1.05 | +6422) -2618 | 1201 ‘0101 | -0043 | -0018 | -0008 
1-10 | 5653] -2833 | 1355 -0129 | «0067 | -0025 | 0011 
1:15 | 5839 | «3061 | 1617 ‘0163 | -0075 | «0036 | 0016 
1.20 | 6039) -3272 | -1688 0205 | «00068 | -0047 | 0022 
1-25 | 0:6282 | 0-3495 | 0-1868 0-0250 | 0-0125 | 0-0062 | 0:0030 
1-30 | -8420 | -3719 | «2064 -0304 | -0157 , 0080 | 0041 
1-35 | -6602 | -3943 | «2248 0366] -0195 | -0103 | +0054 
1-40 | -6778 | -4168 | «2448 0437 | -0240 | 0141 | -0071 
1-45 | 8948 | -43892 | 2664 «0617 | «0292 | -01864 | 0002 
1-50 | 0-7112 | 0-4614 | 0-2865 0-0606 | 0-0353 | 0-0204 | 0-0117. 
1-58 | 7269 | -4835| -3080 -0705 | -0422 | -0250 | "0148 
1:60 | -7421 | -5053 | «3299 0814 | 0499 | «0504 | 0184 
1-65 | -7067 | -5269| -3521 -0934 | -0587 | -0366 | -0227 
1:70 | +7707 | +6481 | -3745 -1064 | «06864 | +0437 | +0277 
1:75 | 0.7841 | 0-5690-| 0-3971 0-1204 | 0-0792 | 0-0517 | 0-0336 
1:80 | -7969 | -5894 | +4197 -1355 | «0910 | -0607 | "0408 
1-85 | 8002] -6094 |. 4428 1616! -1039 | +0707 | -0479 
1:90 | 8208] -6290| «4649 1666 | -1178 | -0818 | -0565 
1:95 | +8321 | -6480 | -4874 1867 | -1329 | «0940 | -0661 
2-00 | 0-8427 | 0-6665 | 0-5096 0:2056 | 0-1489 | 0-1072 | 0-0768 
2-05 | -8528 | .6845 | -5317 -2254 | -1661 | +1216 | -08868 
2-10 | -8624 | -7019 | -5534 24601 -1842 | +1371 | "1015 
215 | 8716 | -7187 | -5748 -2673 | -2033 | -1536 1166 
2-20 | -8802 | -7349 | -59057 -2893 | -2232 | -1712| -1307 
2-25 | 0-8884 | 0-7505 | 0-6163 0-3118 | 0-2440 | 0-1899 | 0-1470 
2-30 | «8061 | -7655| -63863 -3348 | -2656 | -2095| -1645 
2:35 | -0034 | -7799 | -6558 3582 | -2878 | -2300 | +1830 
240 | -9103 | -7937 | «6748 -3820 | -3107 | -2514 | "2026 
2:45 | 9168 | -8069| -6932 -4059 | -3341 | 2735 | 2280 

0-9229 | 0-8195 | 0-7110 0-3579 





0:2984 | 0-2443 | 


Table 1 (cont.). Probability integral of the range W in normal samples of size n 























^| a4 F 
| 42 | 13 
» | |a 48 | 19 | 20 
0-85 | 0-0000 
0-90 | «0001 
0-95 | -0001 | 0-0000 
1-00 | 0-0002 | 0-0001 | 0-0000 
1-05 | -0003 | -0001 | -0001 
1-10 | «0005 | «0005 | -0001 | 0-0000 
135 | -0007| -0003| -0001| -0001 
1.20 | -0010 | -0005| -0002| -0001 
1-25 | 0-0015 | 0-0007 | 0:0004 | 0-0002 
1.30 | -0021 | -0010 | :0006 | -0003 
1-35 | -0028 | -0015 | -0008 | -0004 0-0000 
1-40 | 038 | -0021| -0011| -0006 0001 
145 | -0051 | -0098| «0016 | -0009 -0001 | 0-0000 
1-50 | 0-0067 | 0-0038 | 0-0022"| 0-0012 0-0001 | 0-0001 | 0-0000 
1-55 | -0087 | -0051 | -0030| -0017 -0002 | -0001 | -0001 
1-60 | -0111 | -0067| -0040| -0024 «0003 | -0002 | -0001 
1-65 | -0140 | -0086 | -0053| -0032 .-0004 | «0003 | -0002 
1.70 | -0175 | -0111| -0069| -0043 .0007 | -0004 «0003 
1.75 | 0-0217 | 0-0140 | 0-0090 | 0-0058 0-0010 | 0-0006 | 0-0004 
1-80 | -0266| -0175 | -0115| -0078 .0013 | -0009 | -0006 
1-85 | .0323 | -0217| -0145 | -0097 0019 | -0019 | «0008 
1.90 | .0388 | -0266| «0185 | -0124 .0026 | -0018| -0012 
1-95 | «0405 | -0323| «0556 | -0156 .0035 | -0024| -0017 
2-00 | 0-0548 | 0-0389 | 0-0276 | 0-0195 0-0048 | 0-0033 | 0-0023 
2.05 | .0843 | -0465| -0335| -0241 .0083 | -0045 | -0032 
210 | .0749 | -0550| -0403| -0295 .0082 | -0060| -0043 
245 | «0866 | -0646| -0481| -0357 0106 | -0078 | 0087 
2.20 | -0994| -0753| -0569| -0429 0180 | -0102 | .-0076 
2:25 | 0-1134 | 0-0872 | 0-0669 | 0-0511 0-0172 | 0-0130 | 0-0099 
2-30 | -1286 | -1003| -0779| -0605 0215 | -0165 | -0127 
2-35 | -1450 -1145| -0902| -0709 .0265 | -0207| -0161 
2.40 | -1625| -1300| -1037| -0825 -0325 | -0256 | «0909 
2-45 | «1811 | «1466; -1183| -0953 -0394 | -0315 | -0251 
2-50 | 0-2007 | 0-1644 | 0-1342 | 0-1094 0-0474 | 0-0383 | 0-0309 

ER IERA I 






































303 



































Table 1 (cont.). Probability integral of the range W in normal samples of stze n 








IB 




















2-50 | 0-9229 
2-55 | 0286 
2-60 | -9340 
2:66 | -9390 
2-70 | -9438 
2-75 | 0-9482 
2:80 | -9523 
2-85 | -9561 
2.90. «9607 
2-95 “9690 
3-00 | 0-9861 
3-05 | «9690 
3-10 | -9716 
335 | -9741 
3.20 | -9763 
3-25 | 0-9784 
3-30 | -9804 
3-35 | -9822 
3-40 | «9888 
3:45 | -9853 
3-50 | 0-9867 
3-55 | -9879 
3-60 9891 
3-65 | «9901 
3:70 9911 
3-75 | 0-9920 
3-80 | -9928 
3:85 09836 
3.90 | -9942 
3-95 | «9948 
4-00 | 0-9953 
4-05 | «9068 
410 | -9963 
4-15 | «90607 
4-20 | -9970 
4-25 | 0-9974 
430 | -9976 
435 | -9979 
440 | «9981 
4-45 | «0984 
4-50 | 0:9985 
4-55 | «0087 
4-60 | -9989 
4-65 | «0990 
4-70 | «0991 
4-75 | 0.0992 
4-80 | -9993 
4-85 | -9994 
490 | -9995 
4-95 | -9995 
5-00 | 0-9996 
Ebo 














0-3579 
-3820 
-4084 
-4309 
-4555 


0-4801 
-5044 
-5286 
-5525 
-5760 


0-5991 
"6216 
-8436 
-6649 
-8856 


0-7055 
"7248 
-7432 
-7609 
-7778 


0-7939 
8001 
8286 
:8372 
8501 


0-8622 
:8736 
:8842 
8841 
-9034 


0-9120 
-9199 
9215 
9841 
9404 


0-9461 
-9514 
-9562 
-8607 
«9647 





0-9684 
-9717 
-9747 
-9775 
-9799 


0-9822 
-9842 
9860 
-9876 
9890 


0-9903 


lua. 








0-2964 
-3198 
-3437 
-38680 
-3927 


0:4175 
4425 
4676 
4928 
-5171 


0-5415 
-5656 
-5892 
:6124 
-6350 


0-6569 
"6782 
-6988 
-7186 
-7378 


0:7558 
"T1132 
"1898 
-8055 
8204 


0-8345 
-8477 
-8602 
8718 
8821 


0-8929 
«9024 
9112 
-9193 
-9269 


0:9338 
9402 
9460 
-9514 
-9583 


0-9608 
-8649 
-9686 
"9719 
-9750 


0-9777 





0-2443 
-2665° 
-2894 
«3150 
-3372 


0-3617 
-3867 
4119 
-4372 
-4625 


0-4878 
5129 
-5378 
-5623 
-5864 


0-6099 
-6329 
-6553 
-6769 
6978 


0-7180 
“7373 
-7558 
"77356 
"1902 


0-8062 
8212 
-8355 
-8488 
8614 


0-8731 
-8841 
-8943 
-9038 
-9126 


0-9208 
-9283 
-93562 
-9416 
-9474 


0-9527 
-9575 
-9620 
-9660 
9696 


0-9729 











Table 1 (cont.). Probability integral of the range W in normal samples of size n 











[5 nil | HM ἘΠΕ τ ΜΕΣ, 
κ 44 42 49 14 15 16 . 17 18 ' 19 20 d 
2-50 | 0-2007 | 0-1644 | 0-1342 | 01094 | 0-0890 | 0-0722 | 0 0586 | 0-0474 | 0-0383 | 0-0309 
2-55 | -2213 | -1833| -1514| -1247| «1026 | -0842 0690 | -0565 | -0462 | -0377 
2-60 :2429 :2033 -1697 -1413 -1174 -0974 -0807 -0668 -0552 -0455 
2:65 +2853 +2243 “1891 1691 -1336 +1120 -0937 "07889 0654] -0545 
2-70 | «288ῦ | -2462 | -2096 | «1780 | -1509 | -1278| -1080| «08611 | «0768 «0647 
2-75 | 0:3124 | 0-2690 | 0-2311 | 0:1981 | 0-1696 | 0-1449 | 0-1236' | 0-1053 | 0-0896 | 0-0761 
2-80 +3368 +2926 +2536 +2194 -1894 :1632 :1405 :1208 :1037 -0889 
2-85 -8617 :3169 :2710 :2416 -2103 -1829 1587 "1976 +1192 :1031 
2:90 +3870 -3417 3011 +2647 -2324 -2036 -1782 «1558 -1360 1186 
2:95 :4126 -3670 | ^3258 :2887 -2554 -2255 «1988 -1752 1543 1355 
3.00 | 0-4382 | 0-3927 | 0-3512 | 0-3134 | 0-2792 | 0-2484 | 0-2207 | 0-1959 | 0-1737 | 0-1538 
3.05 | -4639 | -4186 | «9769 | -3387 | -3039 | -2723 | -2436 | .2178 | +1944) «17834 
3-10 -4895 -4446 4029 +3646 +3292 +2970 «26076 +2407 +2164 1943 
3-15 “561560 +4706 -4292 -3907 "3051 -3224 "2028 -2647 +2394 2164 
3-20 +6401 “4965 “4555 -4171 -3814 -3483 “3177 «2506 +2635 :2396 
3-25 | 0-5649 | 0-5222 | 0-4817 | 0-4437 | 0-4081 | 0-3748 | 0-3438 | 0-8151 | 0:2886 | 0-2638 
330 -5893 "5475 -5078 4108 +4348 4016 +3704 :8413 «9142 +2890 
3.35 *6131 -5725 -5337 -4967 “4617 +4286 -39074 :8681 -3407 | -3150 
3-40 «6969 -5970 -5592 -5230 "4886 "4557 -42486 +3953 +3677 -3417 
3-45 -8589 -8209 -5842 -5489 «5151 -4827 4510 -4227 -3950 -3689 
3-50 | 0:6807 | 0-6442 | 0-6087 | 0-5744 | 0-5413 | 0-5096 | 0-4792 | 0-4502 | 0-4226 | 0-3964 
-3-55 | 7017 | -6668 | -8326 | -5994| -5672 | 5362 | -5063 | -4777 | 4504| -4242 
3-60 -7220 -6886 "6658 -6237 -5926 +5624 5332 6051 -4781 4522 
3-65 | -7414 | -7096 | «6782! -6474 | -6173 | -5881 | -5596 | -5321 | -5056 | -4801 
3.70 | 7600 | -7298 | -6998 | -6704 | -86414 | -6132 | -δ856 | -5588 | «6929 | -5078 

« 3-75 | 0-7776 | 0-7491 | 0:7206 | 0-6925 | 0-6648 | 0-6376 | 0-6110 | 0-5850 | 0-5598 | 0-5352 
3.80 | .7944 | -7675 | -7400 | -7138 | +6873 | «6618 | -6357 | -8106 | -5861 | -5622 
3-85 | -8103 | -7850 | 7596 | -7342 | -7090 | -6841 |. -6596 | -6355| 6118 -5887 
3-90 8954. -8016 "PUPPI "1537 12988 “1061 +6827 -6596 -6369 -8145 
3-95 8396 -8173 "1948 -7723 "1497 :7278 -7050 -6829 6611 6397. 
4:00 | 0:8528 | 0-8321 | 0-8111 | 0-7899 | 0-7686 | 0-7474 | 0-7263 | 0-7053 | 0-6845 | 0-6640 
4:05 8053 «84600 8264 -8065 -7866 -7866 "7406 7208 -7070 «874 
4:10 -8769 -8590 -8408 -8223 -8036 "1848 -7660 "1412 1285 7099 
4:15 -8878 :8712 -8543 -8371 +8196 “8021 "1844 "1067 "1491 "1816 
4:20 -8978 -8826 -86869 -8509 88347 8188 8018 “7862 "7686 7520 
4-25 | 0-9072 | 0-8931 | 0-8787 | 0-8639 | 0-8488 | 0-8336 | 0-8182 | 0-8027 | 0-7871 | 0-7715 
4-30 9159 9029 :8896 -8760 -8620 -8479 -8336 8191 -8046 -7899 
4-35 9238 -9120 -8998 -8872 -8744 -8613 -8480 -8345 -8210 -8073 
4-40 0513 9204. -9092 -8976 -8858 -8737 -8614 -8490 88364 «8201 
4.45 | -9379 | -9281 | -9178 | -9073 | -8984 | 8853 | -8740 | «86256 | «8508 | -8391 
4-50 | 0-9441 | 0-9352 | 0-9258 | 0-9162 | 0-9062 | 0-8960 | 0-8856 | 0-8750 | 0-8643 | 0-8534 
4:55 -9498 9417 -9332 -9244 -9153 -9060 8984 -8867 -8768 -8667 
4-60 «9660 -8476 -9399 -9319 -9236 | -9151 9064 -8975 -8884 -8791 
4:65 -9597 -9530 -9460 -9388 9313 9286 9166 9074 “8991 -8906 
4:70 9640 «9679 «98616 -9451 -9383 -9312 9240 -9165 -9090 9012 
4-75 | 0:9678 | 0-9624 | 0-9567 | 0-9508 | 0-9446 | 0-9383 | 0-9317 | 0-9249 | 0-9180 | 0-9110 
4:80 “9713 9666 «9614 | «9560 -9505 -9447 -9387 9326 -9264 8199 
4-85 -9745 -9702 «9656 «9608 -9558 -9505 -09452 -93986 -9340 -9281 
4-90 "9774 -9735 -9694 «9660 «9606 «9669 -9510 | -9460 -9409 -9356 
495 | -9799 | -97865 | -9728 | -9689 | -9649| «06071 -9563| -9518 | 9472 | 9424 
5.00 | 0-9822 ii anes 0-9724 | 0:9688 | 0-9650 | 0-9611 | 0-9571 | 0-9529 | 0-9486 

HERES EE FE oe μμ 


















































305 

















Table 1 (cont.). Probability integral of the range W in normal samples of gize n 



































































^ ala | 4 5 6 7 8 9 | 10 

ΡΝ . 
5-00 | 0-0096 | 0-9988 | 0-9977 | 0-9963 | 0-9946 | 0-9926 | 0-9903 | 0-9878 | 0-9851 
5-05 | -9996 | -9990| -9980| «9967 | -9952| -9935| -9915| -9893| -9869 
5310 | -9997 | :9991 | -9982| -9971| -9958| -9942 -9925| -9906| -9884 
Β155 -0997| -9992| -9985 -9975| -9963| -9950| -9934| -9917 -9898 
5:20 | -9998 | -9993 | -9986| -9978| -9968| -9956 -9942| -9927| -9911 
5-25 | 0-9998 | 0-9994 | 0-9988 | 0-9981 | 0-9972 | 0-9961 | 0-9949 | 0-9936 | 0-9922 
5-30 | -9998 | -9995 | -9990| -9983| -9975| -9966| -9956| -9944| -9931 
5-35 | -9998 | -9995 | -9991 | -9985| -9979| -9971 | «9961 | -9951| -9940 
5-40 | 99909 | -9996 | -9992| -9987| -9981| «9074 | -9966| -9957| -9948 
5-45 | -9999| -9997 | -9993 | -9989| -9984| -9978| «9071 | «9968 | -9954 
5-50 | 0-9999 | 0-9997 | 0-9994 | 0-9991 | 0-9986 | 0-9981 | 0-9975 | 0-9968 | 0-9960 
5-55 | -9999| «9007 | -9995| -9992| -9988 -9983| -9978| «09075 ] -9965 
5-60 |. -9099 | -9998| -9996 | .9993 | -9989| -9985| «9981 | -9976| -9970 
5-65 | -9999| -9998| .9996 | -9994| «0091! -9987| -9983| -9979| -9974 
5-70 | 0-9999 | -9998 | -9997 | -9995 | -9992 | -9989 | «0996! -9982| -9977 
5-75 | 1:0000 | 0-9999 | 0-9997 | 0-9995 | 0-9993 | 0-9991 | 0-9988 | 0-9984 | 0-9981 
5.80 -9999 | -9998 | «9996 | -9994 | -9992 | «9989 | .9980 -9987 
5-85. -9999 | -9998 | -9997 | -9995 | -9993 | -9991 | -9988 | -9986 
5-90 -9999 | -9998 | «9907 | -9996 | -9994 | -9992 | -9990 | -9988 
5-95 -9999 | -9998 | «0998 | -9996 | -9995 | «9908 | -9991 | -9989 
6-00 0-9999 | 0-9999 | 0-9998 | 0-0997 | 0-9996 | 0-9994 | 0-9993 | 0-9991 
605 -0999 | -9999 | -9998 | -9997 | -9996 | -9995| -9994| -9992 
6-10 0-9999 | -9999 | -9998 -9998| -9997| -9998| -9995| -9993 
6-15 1-0000 | «9999 | «9999 | -9998 | -9997| «9996 | -9995| -9994 
6:20 -9999 | -9999 | -9998 | -9998 | :9997| -9996 | -9998 
6-25 0-9999 | 0-9999 | 0-9999 | 0-9998 | 0-9997 | 0-9997 | 0-9996 
6 30 0-9999 | -9999 | -9999 | -9998 | «9998 | -9997 | -9996 
6-35 1-0000 | -9999 | -9999 | -9999 | «9998 | -9998 | -9997 
6-40 0-9999 | -9999 | -9999 | -9998 | -9998 | -9997 
6-45 10000] -9999 | -9999 | «9909 | -9998| -9998 
6:50 0-9999 | 0-9999 | 0-9999 | 0-9999 | 0-9998 
6°55 0-9999 | «9990 | «9099 | «9099 | -9998 

j 6-60 10000] -9999 | «9999 | «9099 | -9999 
6-65 E 2 0-9999 | -9999 | -0999 | -9999 
6:70 1-0000 | 0-9999 | -9999 | -9999 
6:75 ` 1:0000 | 0-9999 | 0-9999 
6-80 - 0-9999 | -9999 
6-85 1-0000 | -9999 
6-90 0-9999 
6:05 1-0000 
7-00 
7-05 
710 
715 
7:20 
7-25 

4 L zi 1 =) | mehe EN 1 




















306 








Table 1 (cont.). Probability in 


i 


tegral of the range W in normal samples of size n 
























































































ie T r 

p 41 42 13 14 | 15 b 16 17 18 19 20 

5-00 | 0.9822 | 0-9791 | 0-9759 | 0-9724 | 0-9888 | 0-9650 | 0-9611 | 0.9671 | 0-9529 | 0-9486 
5:05 -9843 | -9815 | -9786| -9756 | -9723| «0690 | -9655 | -9618 | «0681 | -9543 
5:10 -9861 | -9837 | -9811 | -9784 | -9755| -9725| 0694 | «9661 | «0658 | -9593 
5-15 «9876 | -9856 | -9833 | -9809 | 9783 | -9757 | -9729| -9700| -9670| -9639 
5:20 -9893 | -9874 | -9853 | -9832 | -9809 | -9785 -9760| -9735| «0708 | -9681 
5-25 | 0-9908 | 0-9889 | 0-9871 | 0-9852 | 0-9832 | 0-9811 | 0-9789 | 0-9768 | 0-9742 | 0-9718 
5:30 -9917 | -9903 | -9887 | 9870! -9852| 0833! -θ8ἱ4 | :0794 | 9773| -9751 
5:35 :9928 | -9915 | -9801 | -9886 | -9870 | -9854| -9836| .9819| «9800 | -9781 
5-40 -8937 | -9925 | -9913 | -9800 | -9886 | -8872| -9856| «0841 | -9824| -9807 
5-45 9945 | -9935| -9924| -9912| -9900| -9888| «0874 | -9860| -9846| -9831 
5-50 | 0-9952 | 0-9943 | 0-9934 | 0-8924 | 0-9913 | 0-9902 | 0-9890 | 0-9878 | 0-9865 | 0-9852 
5-55 ‘9958 | -9951 | -9942 | -9933 | -9924 | -9914| -9904| -9893| «988» | -9870 
5-60 ‘9064 | -9957 | -9950 | -9942 | 9934 | -9925| -9916 | -9907]| -9897 | -9887 
6-65 «90969 | «9965 | -9956 | «9960 | -9943 | -9935 | -9927 | +9919 | -9910 | -9901 
5-70 -9973 | -9968 | -9962 | -9956 | -9950 | -9944 | -9937 | -9929 | -9922 | -9914 
5:75 | 0:9976 | 0-9972 | 0-9967 | 0.9962 | 0-9957 | 0-9951 | 0-9945 | 0-9939 | 0-9932 | 0-9925 
6:80 «9980 | -9976 | -9972 | -9967 | -9963 | -9958 | -9952 | -9947 | -9941 | -9935 
5:85 «0985 | -9979 | -9976 | -9972 | «9968 | -9963 | -9959 | -9954] «99496 | -0044 
5-90 9986] -9982 | -9979 | -9976 | -9972 | -9968 | -9964 | «9960 | -9956 | «0965 
5:95 9987) -9985 | -9982 | -9979 | -9976 |} -9973 | «9969 | «0966 | -9962 | -9958 
6-00 | 0-9989 | 0-9987 | 0-9984 | 0-9982 | 0-9979 | 0-9977 | 0-9974 | 0-9971 | 0-9967 | 0-9984 
6-05 -9990 | -9989 | -9987 | -9984 | -9982 | «0980! -9977| -9975| 9972| «0969 
6:10 0999! -9990 | -9989 | -9987 | -9985| -9983| -9981| -9978| «0076 ! -9973 
6-15 -0993 | -9992 | -9990 | -9989| -8987 | -9985 | -9983| -9981| -9979| -9977 
6:20 -9994 | -9993 | -9992 | -9990 | «9089. -9987 | -9986 | -9984| «09862 | -9980 
6-28 | 0-9995 | 0-9994 | 0-9993 | 0-9992 | 0-9991 | 0-9989 | 0-9988 | 0-9986 | 0-9985 | 0-9983 
6:30 -9996 | -9995 | -9994 | -9993 | -9992 | -9991 | -9990 | «9988 | -9987 | -9986 
6:35 -9006 | -9996 | :9905 | -9994 | -9993 | -9992 | -9991 | -9990 | -9989 | «9988 
6:40 «9907 | -9996 | -9996 | -9995 | -9994 | -9993 | -9992 | 9992 | -9991 | -9990 
6:45 -9997 | -9997 | -9996 | -9996 | -9995 | -9994 | -9994 | -9993 | 9992 | -9991 
6:50 | 0-9998 | 0-9997 | 0-9997 | 0-9996 | 0-9996 | 0-9995 | 0-9995 | 0-9994 | 0-9993 | 0-9993 
6:55 «9998 | -9998 | -9997 | -9997 | -9996 | -9996 | -9995 | -9995 | 9994] -9994 
6-60 9998 | -9998 | -9998 | -9997 | -9997 | -9997 | -9996 | -9996 | -9995; -9995 
6:65 9999] -99988 | -9998 | «99981 -9997 ! -9997 | -9997 | «996 | «9996 | -9995 
6:70 «9099 | -9999 | -99898 | -9998 | 9998 | -9998 | -9997 | -0997 | -9997 | «9906 
6-75 | 0-9999 | 0-9999 | 0-9999 | 0-9999 | 0-9998 | 0-9998 | 0-9998 | 0-9997 | 0-9997 | 0-9997 
6:80 9999] -9999 | -0999 | -9999| -9998 | -9998 | -9998| -9998| «99898! -9997 
6-85 -9999 | -9999 | -9999 | -9999| «90098 | -9999 -9998| -9998| -9998| -9998 
6:90 | 0-0999 | -9999 | -9999 | -9999| -9999| -9999| -9999| -9998 | 0098 | -9998 
6:95 | 1-0000 | 0-9999 | -9999 | -9999| -9999| -9999| «0099 | -9999 -9999| -98998 
7-00 1-0000 | 0-9999 | 0-9999 | 0-9999 | 0-9999 | 0-9999 | 0-9999 | 0-9999 | 0-9999 
7:05 0.0000 | 0-9999 | -9999 | -9999 | -9999 | -9999 | «9909 | «9999 
7-10 1-0000 | 1-0000 0-9999 | -0999 | -9999 | -9999 | -9999 
7:15 1:0000 | 0-9999 | 0-9999 | -9999 | «9909 
7:20 1-0000 | 1-0000 | 0-9999 | 0-9999 
7:25 1-0000 | 1-0000 

—- ------ 

















307 





308 E. S. PEARSON 


3. The origin of the present tables 


Tables giving the expected or mean value and the standard deviation of range in random 
samples from the normal population of equation (1) were calculated by L. H. C. Tippett 
(1995) in the Department of Applied Statistics, University College, London. Since the 
probabihty distribution f,,(w) is 1tself far from normal in form, it was evident that its mean 
and standard deviation alone would not provide all the information generally needed in 
practice. Tippett included m his paper some values of the constants f, and fj, of the dis- 
tribution and his work was extended by the present writer (Pearson, 1926, 1932) who 








- Table 2 
Size Lower percentage pointe Upper percentage pointe | 
of Factor 
sample | a, T T J 

n 01 0.6 10 25 50 10:0 | 100 50 3:6 1:0 0.6 01 
2 0-8862 | 0:00 | 001 0-02 | 0-04 | 009 | 0-18 | 233 | 2-77 3-17 9:64 | 3-07 | 466 
3 0:5908 | 0-06 | 013 | 0-19 | 0:30 | 0-48 | 0-62 | 2:90 | 3-31 | 3-68 | 412 | 442 | 5-06 
4 0:4857 | 0:90 | 0:34 | 0-43 | 0-59 | 0-76 | 0-98 | 3:24 | 3-63 | 9-08 | 440 | 4:69 | 5-31 
5 0-4209 | 0-37 | 0-55 | 0:66 | 0-85 | 1:03 | 1:26 | 8:48 | 3-86 | 4:20 | 4-60 | 4:89 | 5:48 
6 0:3946 | 0-54 | 0-75 | 0:87 | 106 | 1:26 | 1:49 | 3-00 | 403 | 4:36 | 476 | 5-03 | 5:62 
7 0-3698 | 0-69 | 0-02 | 1:05 | 125 | L44 | 1-68 | 3-81 | 417 | 4:40 | 488 | 515 | 073 
8 0:3512 | 0-83 1-08 | 1:20 | 141 | 1-60 | 189 | 8:93 | 429 | 4-61 | 4-09 | 020 | 582 
9 0.3367 | 0:96 | 1:21 | 1-34 | 156 | 1574 | 1:97 | 404 | 439 | 470 | 5-08 | 534 | 6-90 

10 0.3249 | 1-08 | 1-33 | 1:41 | 1-67 | 186 | 2-00 | 4:18 | 447 | 470 | 5-16 | 5-42 | 5-97 

11 0:3152 | 1:20 | 1:46 | 1:68 | 178 | 1:97 | 2-20 | 4:21 | 455 | 4:86 | 6:23 | 6-49 | 6.04 

12 0.3080 | 1:30 | 1:56 | 1:68 | 1:88 | 2-07 | 2:30 | 4:29 | 462 | 492 | 5-29 | 5-64 | 6.09 

























































Estimate of =a, x range (or mean range) in a sample of n observations. 


developed an approximate method of determining probability levels for w and provided 
some provisional tables of these. The need has, however, been felt for some time for a full 
and accurate table of the probability integral of the range to fit into place among other funda- 
mental tables associated with the normal distmbution. The completion of this objective has 
been made possible by a grant from the Department of Scientific and Industrial Research, 
whose assistance in the matter 1s acknowledged with warm appreciation. The actual method 
of computation was planned by Dr H. O. Hartley and the calculations were carried out under 
his supervision by Scientific Computmg Service, Ltd. The scope of the mam table was 
limited to n < 20. As n increases beyond this value there 18 an inoreasing risk that the table 
may be misleading in practice, since f,(w) becomes very sensitive to relatively slight 
departures from normality in the tails of the population distribution. 


REFERENCES 


Daves, O. L. & Pearson, E. S. (1934). J.R. Statist. Soc. Supp. 1, 76. 
Duppine, B. P. & ὄπππαττ, W. J. (1942). The Application of Statistical Methods to 
Quality Control. British Standards Institution, No. 600, (1942). 
MoKay, A. T. & Ῥπαπθον, E. 8. (1933). Biometrika, 25, 415. 
Pranson, E. 8. (1926). Biometrika, 18, 173. 
— — (1932). Biometrika, 24, 404. 
——— (1935). The Application of Statistical Methods to Industrial Standardization and 
Quality Conirol. British Standards Institution, No. 600, (1935). 
Treegrt, L. H. C. (1925). Biometrika, 17, 364. 








309 


D 


Il. NUMERICAL EVALUATION OF THE PROBABILITY INTEGRAL 
By H.O. HARTLEY Ἢ 


The formula used for the tabulation of the probability integral P,( W) of the range in normal 
samples of size n is given in the paper printed on pp. 334-48 below, where it proved that 


eH n ο u " nel E 
P,(W) = ( Í z(x) as) +2nf 'z(u) (f ζ(2) as) du, (1) 
k iW -W 


where z(x) = (m) eta, 


Certain properties of this formula and the facilities provided by certain modern calculating 
machines make this integral amenable to tabulation. 
The main work consiste in the evaluation of the integral 


1 


. 2n i κω f. κα) de)" du, (2) 
iW u—W. 


by quadrature for a two variable network of values of n and W. The range of integration is 
from ἑ up to a point where the integrand 


z(u) ( f (2) ap (3) 
-0 


vanishes to 7-decimal accuracy.* Foreach point of the network n, W, therefore, the integrand 

-(3) was tabulated for a set of equidistant values of u covering the range of integration. The 
interval of integration was chosen as Au = 0-2.throughout. This was sufficient to obtain 
about 5-decimal accuracy in the integral (2). , 

The interval in W was taken as wide as possible but sufficiently fine to permit checking 
by differencing and the subsequent subtabulation of P,(W) to interval 0-05, which is the . 
interval in the final table. An interval of AW = 0-25 was therefore chosen for the n, W 
network. 

For small values of W it was necessary to tabulate the integrand for all integers n for 
which P,(W) is required in the final table. For larger values of n and W, however, it was 
sufficient to calculate the final integral (1) for odd n and to obtain intermediate values by. 

- interpolation. Below, then, is shown the two variable network for which the integral (2) 
was produced by quadrature: 


.W = 0-00 (0-25) 1:25 and n= 8 (1) 20. 


W = 1:60 (0-25) 2-76 n=3 (1) 9 (9) 23. 4) 
W -- 3-00 (0-25) 3-25 n: 8 (1) ὅ (2) 23. 
W = 8-50 (0-25) 8-00 n= 8 (2) 23. 


For n = 2 the final integral P,(W) is given directly by the normal integral and may be 
obtained by interpolation in Table II of Tables for Statisticians and Btometricians, Part I. 
Using the notation of that table (Sheppard’s original notation) we have 

΄ 


YW 
u κῃ 
Moreover, for purposes of interpolation, use was made of the formal relation 
PQ(W)-1 fo W>0. 
For fixed u and Ww and for values of n in the Bree progression (4), the integrand (8) 


is a geometric progression with 
. z(u) l L απ aj 
-W 


E f 
* The integrand was caleulated to 7-decimal accuracy in order to obtain P,(W) to about 
5-decimal accuracy. A ; 


310 ‘Probability integral of range 


u 1 

as leading term and ( f -. 2) as) or (f 2(x) as) 

x u-W -W ] 
as common ratio. This leading term as well as the common ratios were easily obtained from 
Table II of Tables for Statisticians and Biometricians, Part I and the terms of the progression 
were then automatically produced on a Mercedes calculating machine Model 38 M.8. and 
copied down in two-way tables with u as row heading,.n as column heading and W as 
table heading. The values of the integrand were then checked by differencing column-wise 
and added to yield the main term of the integral (2). The correction. terms which, according 
to Gregory’s formula, convert the integrand-sum into the integral were calculated from the 
differences and checked by the application of Gauss’ formula of integration. Finally, to 


obtain P,(W) the term ATE à 
( f z(z) as) 
—W s 


was produced by continued multiplication on the Mercedes and added to the corresponding 
integral (2) to yield P,(W) for all points of the above network. 

For odd values of n the integral P,(W) was then differenced W-wise on the National 
machine which, incidentally, produced column totals Σ; P,(W) for these values of n. Two 


wW 
checks were applied at this stage. One consisted in inspecting the fourth order differences. 
As a second check, the mean range, 12,, was calculated from the formula 


w „= 8— P,(w) dw,* 


and compared with the correct mean range given in Table XXII of Tables for Statisticians 
and Biometricians, Part II. Finally, the function P,(W) was subtabulated to interval 0:06 
on the National machine by a method simular to that described in detail by L. J. Comrie 
(1938). 

The values of P,(W) for even n were then obtained by interpolation with the help of 
two interpolation formulae of Lagrangian type: 


2048P,(W) = —5[P, 4(W) + PLQV)] + 490P4 40V) + Εν] ` 


y — 245[P, 4(W) + P,,40W)] + 1228[P, ,(W) + Pays W), (8) 
20P,(W) = P, (W) + Pars W) - 6[Pa_a(W) + PLAQOT)] 
+ 16[P,1(W) + PíGQV)]. (8) 


Formula (5) yields the interpolate for even n from the given values of P,(W) at adjacent odd 
values of n. This formula was used throughout. In some cases, however, the resulting inter- 
polate was accurate to about 3 placed of decimals only. In such cases values of P, 4, P,,5, 
Pa- Pay, accurate to ὔ places of-decimals and values of Pa- Puo accurate to (say) 
8 places of decimals were substituted in formula (6). This yielded a ‘corrected value’ of 
P,(W). The process was then repeated for n = +2 and so on until all values of P,(W) 
had ‘settled down’ for even values of n. It is easy to see that the process is convergent and 
that the maximum error in the interpolate is 2 units for the 5th decimal. 
After completion of the interpolation n-wise, the interpolates P,(W) for even n were 
differenced W-wise, checked and subtabulated as for odd values of n. 


* This is true provided P,(8)=1 to 6-decimal place accuracy. 


REFERENCE 
Coxnrg, L. J. (1938). J.R. Statist. Soc. Suppl. 3, 87-114. 


NOTES ON TESTING STATISTICAL HYPOTHESES 
By E. S. PEARSON 


1. In July 1939, a few weeks before the opening of the present war, a Con- 
ference on the Application of the Calculus of Probabilities was held at Geneva 
under the auspices of the International Institute of Intellectual Co-operation 
(League of Nations). At the public session at which a paper by Prof. J. Neyman 
was presented and also subsequently in some informal discussions, a number 
of questions were raised: 

(a) Inchoosing a test for a statistical hypothesis, is it possible or even necessary 
to specify the hypotheses alternative to that tested? Why should not a test be 
made to depend only on the form of law associated with the hypothesis tested? 
For example, Newton's hypothesis of gravitation was formulated and tested 
without any need to define alternative laws. 

(b) Is the method of approach to these problems advocated by Prof. Neyman 
and myself applicable to testing the appropriateness of probability laws or only 
to testing hypotheses regarding the numerical values of constants contained in 
these laws? 

After the conclusion of the conference, I set down some Notes for a few of the 
statisticians who had taken part in the discussions, hoping that at leisure they 
might feel stimulated to define their views on the subject more precisely. But 
almost before the Notes were despatched, war in Europe had intervened. The 
only reply which I received was from Prof. Gumbel, and this, after some un- 
avoidable delay, has now taken shape in the contribution printed on pp. 317—33 
below. In publishing this, it seems useful to add my own Notes, which aré given 
with only minor verbal alterations in the following pages. They are in part a 
restatement with rather different emphasis of views expressed in a paper published 
four years ago (Pearson, 1938). 

2. With regard to one of the points raised under (a) above, it should be 
remembered that a statistical hypothesis as defined by Neyman and myself is 
a hypothesis concerning the probability lgw of random variables. The gravi- 
tational hypothesis of Newton is not a statistical hypothesis in the sense defined; 
statistical methods may be introduced to test the Newtonian hypothesis, how- 
ever, and they will involve tests of statistical hypotheses or "Significance tests’ 
because it will be assumed that errors of observation exist which may be regarded 
as random variables, probably taken to follow the normal distribution law. 

For example, on the Newtonian hypothesis, the angular co-ordinates of a 
planet measured from the earth as origin may at certain moments be given as 
E=£,y=7, (t = 1,2,...). If we have a number of observations of position z,, y,, 


312 Notes on testing statistical hypotheses 


subject to observational error, the statistical problem will be to test whether 
thesé are consistent with the hypothetical position values &, 7, or whether they 
suggest that £, 7 have some other different values at the moments of observation. 
Thus the ‘alternatives’ that we have immediately in mind will be alternative 
values for £, 7, not alternative gravitational hypotheses. If, however, some alter- 
native lhw of motion were proposed, so that we could specify definite values 
£j, ™ alternative to the values £, 7, of the Newtonian law, then undoubtedly we 
could choose a statistical test which would be particularly efficient in discrimi- 
nating between the two alternatives. Such a course became possible when the 
Einstein hypothesis was formulated and the orbit of Mercury considered. But 
the absence of an alternative gravitational law does not prevent us selecting a 
statistical test which will be (a) sensitive to departures in £, 7 from £, 7, but 
(b) relatively insensitive to departures from normality in the distribution of 
errors. We should make this selection because, if the Newtonian law wereincorrect, 
we believe that this would result in a change in £, 7, but not in a departure of the 
distribution of observational errors from the normal law. 

This example, of course, concerns a statistical hypothesis regarding the values 
of two parameters £, η, not regarding the form of a probability law of random |. 
variables. The following general approach shows, however, that the principles 
discussed may be applied to testing hypotheses regarding probability laws. 

3. Suppose that v is a continuous random variable and that Ηρ is a statistical 
hypothesis which assumes that the elementary probability law for æ is p(x | Hy) 
in the interval —oo to +00. Thus 


+0 
[pe | 2h) de = 1. () 
Now write y= F p(x | Hy) dx. i (2) 


y will be a non-decreasing function of 2 having values confined to the interval 
(0, 1). Further, the elementary probability law of y will be 


or all values of y between 0 and 1 are ‘equally probable’. 

Suppose now that we wish to use a set of n independent values Ly, Lay ...ν Uy 
to test that the probability law is of the assumed form p(x | Ηρ). It is clear that 
the hypothesis H, is exactly equivalent to the hypothesis, say hy, that the n 
values γι, Ya ..., Yn (obtained from the z's by the transformation (2)) have been 
sampled subject to the probability law (3). Just as the point (αι, £g, ...,%,) may 
be represented in an unlimited n-dimensioned space having probability density 


pu, V3; εν Uy | H) σαι | H4) (4) 


E. S. PEARSON 313 


if H, is true, so the point (yp, Y1» ..., Yn) may be placed in an n-dimensioned hyper- 
cube with sides of unit length and with uniform probability density, if H, and 
therefore A, is true. It follows that if H, is what has been termed a ‘simple hypo- 
thesis’, i.e. specifies the form of p(x | Hj) completely,* then the test of H, may- 
always be transformed to the test of họ. If then it were correct to say that the test 
of a statistical hypothesis depends only on the form of the law specified by Ho, it 
follows that for the type of situation considered the testing of a statistical hypo- 
thesis could always be reduced to the following simple problem: 

To test whether a sample of n independent random variables γι, y», ...; Yn 
(0 € y, € 1) has been selected from the so-called rectangular distribution, i.e. the 
distribution for which p(y) = 1, (0€ y « 1). 

4. We are at once faced, therefore, with the question of how to test this 
simple but apparently fundamental hypothesis. Lf h, is true, the sample point 
is equally likely to fall at any point within the n-dimensioned hypercube. Thus 
in picking out the critical (or rejection) region in this space we can get no assist- 
ance whatsoever from the changes in probability density, as we might do in 
the z-space. If we wish to use a level of significance of æ (say a = 0-01) for rejecting 
ho, it is clear that an infinite number of critical regions satisfying this condition 
are available; it is only necessary to select a region whose content is a. 

If we consider the n values of y and plot them in the interval (0, 1) as follows, 


Fig. 1. 
the great majority of samples, from a rectangular distribution, at any rate if n 
is not too small, will be spread out fairly uniformly throughout the interval. 
Perhaps an ‘ideal’ sample against which to measure irregularities might be 
described as one for which the values of y fell at 
1 3 5 2n—1 

But what form of departure from this ideal of uniformity are we to pick out as 
suggesting that the hypothesis h, is disproved? Should we judge significance by 
paying attention to the value of the mean y, of the variance, of the range of 
variation or of higher moments? Or should we use the x? or w? tests? It seems 
difficult to find any basis for choice which could be regarded in any sense as the 
*best'.-For any set of values y4, ys; ..., Yn some critical region of size æ can always 
be found which will contain the sample point and therefore lead to the rejection 
of hy. Indeed, the task of selecting a unique region on any rational basis would 
seem to be insoluble. j 


* This condition is important. If the values of certain constante oontained'in the probability 
law need to be estimated from the observations, then the n values of y will not form a true random 
sample from a rectangular distribution. They will be subject to certain limitations to their degrees 
of freedom, though these may be relatively unimportant if n is large. 


314 Notes on testing statistical hypotheses 


5. Directly it is recognized, however, that the choice of a test of a statistical 
hypothesis dapends on something more than the form of the law associated with 
that hypothesis, it can be seen how a solution may be obtained. If we can specify 
a single alternative H, to H, or a class of alternatives C(H), then we shall have 
also an alternative h, or a class C(h) to hy. Thus, if p(x | H,) denotes a Erne 
law altérnative to p(x | Πρ), then for y the alternative is 


d H, 
piy lh) = pte B) Z = PUR 








gef) 
where f(y) means the solution of 


y JE ple | Hy) ἆ l (6) 


with regard to x. For example, Fig. 2 shows three typical forms of alternative 
ply | ^), p(y | ha) and p(y|h,) associated with alternatives p(x | H,), ..., etc., to 
(æ | Ho). 





l | 0 
Solid rectangle represents p(y|ho) 
Fig. 2. 


We can now see the kind of test which will be most efficient for testing H, 
with regard to possible classes of alternatives. If the alternative laws are of 
smaller dispersion (as p(x | H;)), we must be on the look-out for too many values 
of y near 4 and too few near 0 and 1. For alternatives with greater dispersion 
(as p(x | H,)), we must reject H, when there are too many y’s near 0 or 1 and too 
few near 4. While if the alternatives are likely to be asymmetrical curves (as 

(x | H3)), then a different rule will be needed, as suggested by the p(y | hg) curve. 

6. It follows that in so far as it is possible to formulate the class of admissible 
probability laws p(x | H), the problem of selecting the most efficient test of M, 
reduces to that of choosing a critical region in the n-dimensioned hypercube 
which is most effective in detecting, from a sample of n values of y, differences 
between the rectangle p(y |h) and the appropriate alternative forms p(y | ή). 


E. S. PEARSON 315 


If H, is a single admissible alternative, then it has been shown (Neyman & 
Pearson, 1933, p. 298) that the region w, of content « in the hypercube, within 


which n 
. i P(Y: | Ao) 
Si <k, (7) 
Π P(Y: | Ay) - : 
or, in view of equation (3), I p(y, | ha) 51 : (8) 
τι 
‘where kis chosen so that — P((y, ys... 95)ewo| hy] —« E (8) 
has the following property. ` 


, Of all regions of content a, w, is more likely than any other to include the 
sample point when.A,, and not ἦρ, is true. The region has been termed the best 
critical region for testing δρ with regard to the alternative h4: 

As soon as H, and H, are specified, clearly p(y | hı) and therefore the region 
w, can be found, although mathematically it may be rather difficult to determine 


the appropriate boundary I p(y; | ^4) = constant, so as to satisfy (8). Since this 
i-i 


product is the probability density in the hypercube given’ by ἦι, it will be seen 
that what we set out to do is to include in the critical region those parts of the 
sample space where the density for h, is highest. It is here, on repeated sampling, 
that sample points would tend to be concentrated if h is true, instead of being 
. uniformly distributed as under hy. | 

7. If instead of a single alternative h,, there is a class of admissible alter- 
natives C(h), there may or may not be common points of concentration that can 
be included in the critical region. This will depend on whether the inequality 
(8) above defines a region independent of the particular hypothesis ^ of the class 
Q(h). Even if there is no single region of content æ which is exactly a ‘best 
critical region’ for kọ with regard to all members of C(A), the general principle 
may still be used as a guide. We build up a critical region out’of those parts of 
the hypercube where the probability density tends to be concentrated when the 
probability law departs from p(z | H,) in the direction of the alternatives included 
in ο), : 

For example, in my earlier paper (Pearson, 1938) I suggested as appropriate 
in the following situation a test which, while not based on & common best critical 
region, was selected so as to include regions of greatest density associated with 
alternatives of C(h). For the hypothesis n $ 


z . Dp (x | H oF * Jos mre 
The alternatives are asymmetrical curves with the same mean and standard 
deviation as (10). A typical alternative would m the Be HE curve , 


p(x|H) = eL pee e Tk, 11} 


Biometrika xxxi 2X 


(10) 


* ΄ 


916 Notes on testing statistical hypotheses 


whose form departs more and more from (10) as «ῄθι increases from zero, but the 
class need not be defined as precisely as this. In this problem it appears that 


if n independent observations 2,, ἄρ, ..., z, are available, the following is a good 
test of H,. Take as test function * ` 
Q = Hw), (12) 
where | y,—5(02—9, for O<y,<0-2, 
y, = $y, 02) for 02«y,«08, ^  ' ^ (18) 


y, =B(l—y;) for O8«gy;«l,. 


A P 
= —_— gd 4 
and Yi f da. (14) 


If H, is true it may be shown that —21log,Q is distributed as x? with 2m 
degrees of freedom. Hence any desired significance level α, for Q, mney be found. 
We should then reject H, when Q is significantly small. 

_A more systematic method of dealing with such problems has been con- 
sidered by Neyman (1937) in his paper on ‘smooth tests’. 

8. To sum up, the position seems to be this. It has often been argued that a 
statistical test need only depend on the form of the probability law associated 
with the hypothesis tested. In the case where H, concerns the probability law 
of a single random variable and where p(x | H,) is precisely specified, by the trans- 
formation from z to y it has been shown that the problem of testing H, on the 
basis of n independent values of z can always be reduced to another problem, . 
which involves this question. Can we regard a sample y;, Yo, ..., y, 88 having 
been drawn from the rectangular distribution p(y | hj) = 1, where 0&y «1? We 
are faced with a single fundamental question and we have to consider whether 
16 can beanswered in a rational manner, unless we are prepared to take into account 
the kind of departures from the rectangular law that we either believe possible 
or at any rate consider it most important to be on the look out for. 

The transformation from x to y seems to have the advantage that it con- 
centrates attention on the main point at issue. That is my reason for emphasizing 
it in these Notes. Most of us have many preconceived ideas about gppropriate 
tests if the probability law is taken in the form of p(x | Hj); we are accustomed to 
use the mean, the standard deviation, certain functions of moments, the y? test, .... 
But we are not so accustomed to test whether a sample comes from a rectangular 
distribution and we are therefore forced or, indeed, more willing to reconsider 
from first principles what course we should follow and why. 


REFERENCES 


Neyman, J. (1937). Skand. Actuar. Tidskr. 149. - 
ΝΕΣΜΑΝ, J. & Pearson, E. B. (1033). Philos. Trans. A, 231, 289. 
Puanson, E. S. (1938). Biometrika, 30, 134. 


317 


e 
^ 


SIMPLE TESTS FOR GIVEN HYPOTHESES 


f Bry E. J. GUMBEL r . 
New School for Social Research, New York 
CONTENTS p 
PAGE 
1. Control curves and the probability integral transformation . : . 3817 
2. Classical tests applied to a uniform distribution . : : : . 821 
3. The mth pomt test  . : $ : . s; i s : . 326 


Iw dealing with the problem of testing statistical hypotheses J. Neyman (1937) 
and E. S. Pearson (1938) have considered the use of the probability integral 
transformation, which leads to a theoretical uniform distribution. This method 
presupposes that the usual comparison between theory and observations has 
already been applied. We shall first improve this comparison by introducing 
control curves. Then we shall apply to the uniform distribution the usual methods 
and the control curves. This will lead to simple tests for given statistical hypo- 
theses. 


1. CONTROL OUEVES AND THE PROBABILITY INTEGRAL TRANSFORMATION 


Let x be a continuous random variable for which n observations have been 
made. Let zz, be the observed values arranged in increasing order of magnitude, 
m (1,2,...,n) being the serial number. The simplest way of representing the 
observations is to plot the cumulative histogram £p, m. The relative number 
W(a,,) of observations less than or equal to £m is given by : 

m = nW Tm). 
The consecutive differences 
N : nWO(z.)—nWO(z) (<m) 
constitute the observed distribution. Many statisticians present, instead of the 
original observations z, , only the number of cases within certain arbitrary classes. 
From the practical standpoint this means a simplification, from the theoretical 
standpoint a complication. We shall suppose that all z,, are known. : 

The choice of a probability density to be applied to the observations con- 
stitutes the hypothesis. The probability density w (x, €, Ca, ...), where €, Cg, ... 
are the constants, is called the theoretical distribution. For sake of simplicity it 
is assumed that all observed values z,, have the same theoretical distribution. 
The probability W (x, ει, ον, ...) of a value equal to or less than α; is given by 

zx 


W (a, C1, Cy, ..-) -f W(Z, οἱ, Cg, ...) de, | (1) 


318 ~ _ Simple tests for given hypotheses 


where zis the variable of integration. It is customary to compare the observations 
Lm, Mm With thb cumulative frequency curve z, nW(z). This comparison can be 
improved in the following way: The mth observation is a ο. variable 
distributed according to : 


; walen) = [η αν .. ϱ) 


` In a previous article (Gumbel, 1935) it has been shown that, for an ordinary 
‘unlimited distribution, with large n and with m of the size 3n, the distribution 
(2) converges towards a normal distribution with a mean given by W(x) = m/n | 
and a standard deviation ; 





W(z) a: Z ο) 
© (x) ay n 
This formula does not contain m explicitly. Since each theoretical value z can 
be interpreted as an mth value, (3) gives its standard deviation. The interval 
x Y 0 will be called the control interval. Under the above condition, the probability j 
that an mth value will fall within the control interval is about $. The two curves 
obtained by plotting « F o, nW (x) will be called control curves. 

' For a given initial distribution we shall have to find the mean and the standard 
deviation of the mth value which may differ from those of the general solution, 
especially if n is small. The control interval will be a certain function of w(x). 
Also the probability associated with the interval x F e may differ from that of the 
general solution and may depend upon z. For the exponential distribution 
(Gumbel, 1937) the precision diminishes with increasing values of the variable. 
Below we shall apply this control to the uniform distribution w(x) = constant. 

The calculation of the probability W(x) and the control curves can often be 
. simplified by an indirect method. For certain, but not all, distributions it is 
' possible to eliminate the constants by introducing a new variable ey as a function 
ae TE x = f(Y, Cy, Cg, ++). . ὦ) 
Accordingly, the probability that & value of the transformed variable will fall 
in the interval y to y + dy is 


υ(ῳ) dz = οἱ fly) | f(y) | dy... 





We call — τ ply) = wl fy) Fy) |, : 
ax : * 
or. | ply) = w(z) 5 (8) 
the distribution of y. The probability V(y) of a value equal to. or less than y is . 
V(y) = ία, e, ος, ...). (6) 


The transformation (4) is chosen such that the expression V(y) does not contain 
any constants which depend upon the observations. Therefore V(y) can be 


E. J. GUMBEL- di 319 ` 


calculated once and for all as a function of y. Such tables have been calculated 
for the distributions for which this reduction is possible. ` . 

In order to compare the cumulative histogram z,,, m with the cumulative 
frequency x, n W (x), and to use the control curves, it is first necessary to compute 
the constants 61, Ca, .... If the method of moments is used, the area between the 
cumulative μη and the horizontal line W = 1, the arithmetic ‘mean, is 
conserved. The value of the variable z which corresponds to a selected, numerical 
value of W(x) is obtained from the transformation (4). 


A special case of the transformation (4) which leads to a reduced: distribution. 


of astounding simplicity is the probability integral transformation due to Karl 
Pearson (1938) who introduced for y the probability function W(x). Since 


dx 1 
ay we)’ (7) 
we obtain from (5) p(W)=1. 5 ^. (8) 


This identity means that the distribution of the probability is constant. As 
$(W) is the probability density of a probability, it is difficult to establish its 
philosophical meaning. But formally the construction i8 valid and the corre- 
sponding value can be observed. It is our purpose to give several methods of 
judging the significance of the differences between the theoretical distribution (8) 
and the corresponding ‘observed’ distribution. 

The word ‘observation’ will be given a special meaning. A certain theory hy 
which involves the choice of certain constants οι, Ca, ... applied to the observa- 
tions, leads to the values 


Wo = W (£m Cr Ca- | ho) (m = I, 2,..., 2), 


corresponding tó 2,,. These values, contained in the interval 0, 1, are the ‘obser- 
vations’. To any other set of constants c, cy, ... will correspond other ‘observed’ 
values 1 rat 
Wo = Wats οἱ, ος, +++ | ^o). 

Therefore any test applied to the ‘observations’ of formula (8) might be used {ο 
"judge the. choice of the constants. To another hypothesis h containing ue con- 
stants d4, ds, .. ., will correspond another set of ‘observed’ points 


K " = W(tm di, ὧν... | hy). 


The same observations when interpreted ‘by different theories or different 
constants lead to different ‘observations’. An incorrect theory involving properly 
chosen constants might give better results than a correct theory involving im- 
properly chosen constants. Therefore, to compare different theories, the constants 
for each must be determined with the same precision. In practice this condition 
will not always be fulfilled. For the precision of the characteristics depends upon 


the distribution for which they are calculated (Gumbel, 1936). Therefore, the 


320 Simple tests for given hypotheses : - 


` 
same method of determining the constants might lead to different degrees of 
precision for different distributions, whereas different determinations of the 
constants might lead to approximately the same precision. 
There is another point for caution: all tests derived from the probability 
integral transformation apply to the analytic form of the hypotheses and at the 
same time to the choice of the constants. A formula containing many constants 


may reproduce the observations more closely than a formula containing few ~ 


constants, even though the constants in the first hypothesis have no meaning. 
Therefore we must limit our comparison to hypotheses containing the same 
number of constants. For any set of statistical observations in the ordinary sense, 
there will usually correspond a small number of tenable hypotheses. We shall 
suppose it is known what they are. For we do not try to find a formula for the 
sake of doing it, but to explain the observed facts. We will not go so far as Neyman 
(1937), who formulated all possible alternatives by a series of us ο. 
functions. 

In theory the points representing W(x,) are distributed uniformly in the 
interval 0, 1. This is true for any hypothesis, provided the variable is continuous. 
But in practice this will never occur. The ‘observed’ points corresponding to any 

given hypothesis will differ from the theoretical set, even if the hypothesis is a 
very good one. The, differences between the ‘observed’ set of points resulting 
from A, and A, and the theoretical set allow the construction of tests which 


can be used to judge which of two given hypotheses is the better. But no ` 
statistical method gives an answer to the question whether or ποῦ a hypothesis . 


΄ 


is true. 

After a hypothesis has been selected, the preliminary steps which have to be 
made before it is possible to use the probability integral transformation, are: 
first, the determination of the constants; secondly, the calculation of probabilities 
W(x) for the values of x given by the transformation (4); and thirdly, the calcula- 
tion of the probabilities W(z,,) of the observed values. It is only after these three 
operations have been carried out that we obtain the ‘observations’ which are 
to be compared with the theory (8). Therefore, any test based on the probability 
integral transformation presupposes the usual comparison of the observed 
cumulative histogram with the frequency curve. In many cases this comparison, 
checked by the contro] curves, will indicate a clear superiority of one of the 
theories. If this is true, there is no necessity for a new test. 

' It would be interesting to investigate the best criterion for judging the 
significance of the differences between the ‘observed’ and the uniform distribu- 
tion (8). But for practical purposes it is sufficient to know whether the differences 
for ho are smaller or larger than for A,. First we shall establish rough measures of 
comparison; afterwards, more refined ones. 


« 


t 


E. J. GuwsEL 321 


2. CLASSICAL TESTS APPLIED TO A UNIFORM PORTON 


The comparison of an observed distribution of a continuous veriabis with the~ 
theoretical distribution is reduced by the probability integral transformation to 
& comparison of ‘observed’ points with a uniform distribution. It seems logical 
to use first the classical methods which are here very simple, as no constants have 
to be determined. For a uniform distribution 


P(y-1 (OSyS)), 
the arithmetic mean and the median are, respectively, 
y=F =. (9) 
The mean error 0 and the probable error p, defined as half of the difference between - 
the two quartiles, are . 
E G=p=}. (10) » 


The kth moment about the origin is 





1 
=~ ! 
Μι k + Ἱ , 
which gives the recurrence relation 
B L| od 
zm].- 11 
Tha M,” τ 


Since the distribution i is symmetrical, the odd moments about the arithmetic 
mean vanish. Therefore f, =0 
1 —: . 


-- - 


The even moments are 


= κ y*d =f dz aeh | 19 
tn =| w- pest OT. Mek = guo Ep) (12) 


Therefore the standard deviation, the coefficient of variation and the second 
beta are, respectively, : ἃ 
esse (13) 


NEY 
l 
` = NEX 7 £ (14) 
and zd $72. = (15) 


It is necessary now to calculate the ‘observed’ means, the measures of dis- 
persion, and the relations between successive moments. To control the agree- 
ment between the theoretical uniform distribution and the ‘observed’ points 
we can still employ the standard error c; of the arithmetic mean. The general 

` formula 


΄ 


322 Simple tests for given hypotheses 


1 

24 (3n) 
The standard error of the dispersion for n large is σ(σϑ) = S μι--μὲ -Ἡ ΕΒ which 
becomes, according to (12), 


στο τίσ) = [a [α -δ)] "ηδη; | 9 


It seems reasonable to employ these old-fashioned tests before the use of 
more sophisticated methods is resorted to. Only if they fail would it be necessary 
to consider more elaborate methods. 

The n ‘observed’ points W(z,,) represent the probabilities, obtained from the 
hypothesis ho, of the given observations in the ordinary sense, £m. We plot these 
points in the interval 0, 1 which is divided into k cells of equal length, where k 
is chosen in such a way that n/k is an integer. If n is a multiple of 10, we choose 
the cells (0-0, 0-1), (0-1, 0-2), ..., (0-9, 1-0). 


becomes according to (13) gy = (16) : 





The probability density of a point falling somewhere within the interval 0, 1. .': 


is constant. As the interval is of length 1, this density is 1. Therefore, the prob- 
` ability of a point lying with a given cell is 1/k, and the expected number of 
points in each cell is n/k. The ‘observed’ number of points obtained through a 
hypothesis A, will be a, (v = 1,2, ..., k). If we apply another hypothesis h, to the 
same observations or introduce other numerical values for the constants, the new 
set of ‘observed’ points will lead to values b, which, in general, will differ from a,- 

The classical statistical method of treating this material is the y? test. As the 
numbers αν, b, will differ from the expected number n/k, we can calculate for both 
hypotheses nM á 
| ο οἱ τ 
The better hypothesis will have a lower value of x? and a greater value P, where 
P denotes the probability of obtaining the ‘observed’ deviations from uniform 
distribution or larger ones. The probability P depends upon the number of cells 
chosen. Therefore, to compare two competing hypotheses, the same division must 
be used. : 

The application of the x? test to the ‘observations’ W(z,,) eliminates an ` 
arbitrary action which is a serious and well-known drawback of the x? test, when 
applied to the original observations £m. The expected contents of the classes 
depend upon the distribution. Therefore certain classes, as a rule the first and the 
last, must be chosen such that the expected number is not too small, otherwise 
X? becomes very large. In our case, no cell differs from any other and no arbitrary 


combination of cells is needed. We can choose k = n. The mean number of points 
in each cell will then be one and 


n n 
χδ = Σα,- IP, χἲ- Σ (ὃν -- 1}. (18%) 
x= ve 


E. J. GuMBEL 323 


This choice removes another drawback of the y? method: different classifications 
used for the same observations lead to different shapes of the distribution and 
therefore to different values of χ». Here the classification i is prescribed once and 
-for all. 

Another comparison between theory and ‘observation’ may be ‘based on the 
fact that different sets of points a, and b, Have different probabilities. The prob- 
ability that a, points will fall within the cell v (= 1,2; ..., k) is 


E L\artdat --αι 
apl! Μι 


αγἰας!... , 


where ` Q taat... +4, = m. 


Since the factor nlk- is constant, it is sufficient to investigate 


= ; (19) 





Of course JI < P, as the latter probability applies to the ‘observed’ deviation or 
larger ones. The statement ‘The probability for a, points to be contained in the 
cell v is proportional to IT’ may be inverted according to Bayes’s principle. 
Therefore, JI is proportional to the probability that the distribution of points 
is rectangular, i.e. that họ is a good hypothesis. 

The question for which set d, the probability JI is maximum, is the starting- 
point of the classical relation between entropy and probability. For large n the 
most probable set of points is the one which has the same number of points in 
each cell, i.e. i ] 
ᾱ, md, = n = mI. |, Q0 


Let us call Imax. the probability which corresponds to this distribution. The 
probability. of the hypothesis ἦρ will be greater, equal to, or less than the prob- 


ability of if 
ty of hy πι Th 
ua 74x 


The relative probability of both hypotheses will be /7,/I7, or H/H, depending 
on whether 2 
ZI.. 





(21) 


As these probabilities depend on the number of cells, the same division must be 
used to compare two competing hypotheses. We can choose k = n. Then Imax. = 1 
and we can use 77, as test. 

The entropy test (21) is closely related to the x? test (18). This classical 
relation can be obtajned in the following simple way: if q is the constant prob- 
ability of a point falling within a given cell, then for n observations the expected 


324 | Simple tests for given hypotheses 


number of points within a cell is ng. But the ‘observed’ numbers a, will differ 
from the expected number by e, so that 


a, = πα Γ6ν 
΄ k 
where ο f 3620. 
AD ν--1 
The quotient (21) becomes 
: IH, E (mg 





Imax. yet (ng + Εν) U 
When » is large, each factor becomes, by application of Stirling’s formula, 


sc [m t] 


Expansion of the logarithm leads to 7 





| 5 
πο = (aa) a|- i oar ii 








According to the meaning of e, we obtain . ώς 
mee La | 
e —— , 
Imax. i 22 nq 1 
: IT, E —ix* 
whence ; mo eX, (22) 


Therefore, when n is large, the entropy test becomes identical with the X? test. 
This result was derived by Neyman & Pearson (1928), when they showed that 
the x? test followed from their ‘likelihood ratio’ method of approach. 

^ Neither test will give an answer, if the number of points a, assigned by A, to 
the cell v is equal to the number of points 6, assigned by h, to the cell A, a, = by, 
where for any v (= 1,2, ..., k) it is possible to find a A (= 1,2, ..., k), such that not 


-. all A — v. An example η this occurring is shown in Table 1, eal C and F. The 


reason for the failure is that we do not make use of the actual position of the 
observed points within the cells. We only ask in which cell they are situated. 
Although in such a case the tests do not show any difference between the arfange- 
ments a, and ὄχ, some conclusions might be drawn from such ‘observations’. 
If the number of points falling in the first few cells and also in the last few cells 
is disproportionately large, and if there is a deficiency in the middle cells (Table 1, 
- col. D), we have to conclude that the distribution h, is too concentrated or that 
we have chosen too small a value for the constant which depends only on the 
standard deviation. If the number of points in the cells at either end is small 
(Table 1, col. E), the inverse inferences follow. These considerations may give a 
hint about the choice of an alternative hypothesis. 


E. J. GUMBEL 325 


To illustrate the above methods, let us take the ‘fictitious example given by 
Pearson (1938) in his Fig. 2, p. 136. He arranges n = 10 points in k = 10 cells 
and considers six sets A, B, ..., F, given in Table 1. 

Let us suppose that these six sets are the results of six different hypotheses 
applied to the same observations. The x? test leads to Ν 


P,» Py» Po = Pp Pp = Pr. 


The probabilities of the various columns give the same ordering 


Tax. = 14> Wp > Ho = p> Tp = Hg. - 


The most probable set contains one point in each cell (set A). It is not possible 
to decide whether C is more probable than F, and whether B is more ΡΤ 
‘than E. : 


Table 1. Pearson’s set 








|. 
































Class | A B C D E F 
0-0-0-1 1 2 0 2 0 0 
0-1-0-2 1 3 0 1 | 0 1 
0-2-0 3 1 2 0 2 1 2 
0-3-0-4 1 1 0 0 2 2 
0-4-0-5 1 0 1 0 2 0 

. 0-5-06 τὰ 1 2 1 3 1 
0-6-0-7 1 0 1 0 1 0 
0-7-0'8 1 1 25 1 -1 0 
0-8-0-9 1 0 o` 2- 1 0 2 
0-0—1-0 1 _ 0 -2 2 0 2 

- 2 0 10 8 6 10 8 
1 0 350 -534 0-740 0-350 0-534 
I 1 aT d ἆ oy vy 
ix Ls : | 





The χ3 and the entropy test are based upon the same data. But the results 
reached are incomplete, as artificialities are introduced by the classification of 
the ‘observations’ into the arbitrary cells. The actual position of the points 
within the cells is not used. The set A shows that these tests may be misleading 
in still another way. Each cell in set A contains exactly the expected number. 
But it would be false to.conclude that the hypothesis is true, since the actual 
positions of the points within the cells might differ from the ideal positions. 

Let us suppose we know these positions. It might then happen that the 
difference between the observed and the ideal positions of the points is smaller 
for a set K than for a set L, even if the differences between the actual and the 
theoretical number of points is larger for K than for L. 

Itis now our task to assign a meaning to the term ideal position and to define 
& measure of the differences between the *observed' and the ideal set. 


326 grt Simple tests for given hypotheses 


\ 3. THE MTH POINT TEST 
The ideal position of n points, distributed with uniform probability over the 

interval 0, 1, is such that the distances between consecutive pairs of pointe are 
equal. But there are a number of ways of distributing n points equidistantly 
over the interval 0, 1. E. S: Pearson, in the preceding note, suggests that 

,  2m-1 

In = r < (23) 
might be used as the ideal position of the mth point. However, as y is a statistical 
variable, we should represent it by an average, to choose which we must consider 
the distribution of the mth point.* Any observation chosen at random has the 
same probability of falling on any position y within the interval. But for the mth 
_point this probability depends upon y and m. The initial distribution w of the 
variable W(x) = y is constant. According to (8) 

w(y-1 (0SyS)). 

The probability of obtaining a point equal to or less than y is y. Aeran to 
(2) the distribution of the mth point is 


ros») = (7) mma -y | 59 


The distribution (24) is of Karl Pearson's Type I. For m = 1 (and m = n) the 
‘distribution will only decrease (increase). The distribution of the mth point is 
. equal to the distribution of the (n—m + 1)th point. If we replace y by 1—y 

Is maa(1— y. n) = m mY, n). . (24^) 
The most probable position ᾖ of the mth point 1 is given by 


1-y ν᾿ 
which leads to In = ae 3 (25) 


For given values of m the median position can be obtained from the tables of 
the Incomplete Gamma Function. To find the arithmetic mean ¥ and the control 
curves, it is necessary to have the moments M, of (24). They are 








M= n m E mak—1(] n-m d. 
k= a =y dy. 
According to the well-known properties of the Gamma function 
M. E n! (m-k—-1)(n—m) πὶ (m+k>1)! 
k (m—1)! (n-m)! (n+ k)! ~ (n+k)! (m—1)! 


* [The distribution of a ranked individual sampled from a rectangular population, and the 
moments of this distribution, were obtained by Karl Pearson in the first of two papers (1931, 
p. 390, and 1932) dealing with ranked variates. Ep.] 


« 


5 m+k 
Therefore : Mya = M, LAETI n (26) 
For k = 1 the arithmetic mean of the mth point is 
ως m 
Ym ΚΠ n+ 1 , (m) 
- and for k = 2 the second moment is 
-. m+i1 
M; = Vn d . 
Finally, the variance Lh Ui nrc (28) 


(n-F1)*(n4-2)' 
These formulae apply also to the cases m = 1 and m = m. The standard deviàtion 
of the mth value may be written 
. ya - : D 
on = / Pe de) . Ὁ \ (28 ) 
This formula differs slightly from the general expression (3), and leads to an 
unexpected result: as we approach the centre from either side, the precision of 
the mth point decreases. The precision of the mth point will be a minimum for . 
m = jn 1, ifn is even, and for m = $(n+ 1) if n is odd. The values of c, /(n + 2) 
are given in Table 2. ΄ 


Table 2. Standard deviation of the mth point 


Im fs Omy(n+2) A 
0:06 0-95 

0-10 0-90 i 
015 0 85 

0-20 0:80 

0 26 . 0-76 

0-30 0-70 

0-35 0-66 

0.40 0.60 

0:45 0:56 

0:50 0.50 











We must now decide whether to use the mean (27) or the mode (25) as the 
ideal position of the mth point. The modes of the first and of the last points are 
0 and 1 respectively, whereas the corresponding means are 
1 m: 

n+l “ntl 
As the ‘observations’ We(z,) and Ws(z,) of the first and the last point differ 
from 0 and 1, the arithmetic mean is to be preferred. 


w= In = 


328 Simple tests for given hypotheses 


Formula (27) gives the ideal position and therefore the theoretical numbers 
of points in each cell which can be compared with the ‘observed’ numbers. This 
method leads to an improvement of the tests (18) and (21) where the choice ος, 
the cells was still arbitrary and where the actual position of each point was not 
taken into account. f 

Bestdes comparing the uniform distribution with the ‘observed’ position of 
the points we can use the corresponding cumulative frequency. The probability 
scale y is plotted as abscissa and m as ordinate. We count the number of points 
below m. The mean position y of the mth point becomes a straight line differing 
from the diagonal which represents the modal positions. The figure opposite 
. traces, for η = 20, the mean, the modal and Pearson’s position of the mth point 
given by (23). - 

In the same way we plot the ‘observed’ points obtained by hg, hy, .... These 
probability points W(z,,), Wilm) will be scattered about the straight line. Usually 
the area between tle observed cumulative frequency curve, the ordinate and the 
parallel to the abscissa is kept equal to the corresponding area for the theoretical 
curve. Since for the present problem no constants have to be determined, we 
have no way of enforcing this equality. The area J bounded by the diagonal 
straight line through the points with the co-ordinates m/(n + 1), m (m = 1, 2, ...,n), 
the length 1/(n+ 1) to n/(n+1) of the abscissa axis and the two parallels to the 
ordinate axis, is Tate η, : $ (29) 


This might differ from the area J of the n — 1 ‘observed’ trapezes 


Ym Πο, 1l, Ymy (m= 1,2,...,n— 1). 
As y;, are the ‘observed’ points ES 


n—1 
J® = io +$) oaa 7 Ym) 


n 


nai : 1* “yn-1 
= Σ Ym(m— 1) -- Σ MY mts Dn à Ym 
2 1 2 1 
τὶ ; 
= Ds — È Ym + Ms -- 3). 


If we replace each value y,, by its expectation from (27), we get 
n?—1 
2(n +1) 
= 4(n—1), (29°) 
_as it ought to be. The ‘observed’ area is not equal to the theoretical area, but its 
expectation is. It might happen that the numerical value of J is very close to J 
as & result of compensating deviations. Therefore this numerical comparison can 
be used as a test only in connexion with the graph of the ‘observed’ and theoretical 
cumulative histogram of the mth points. ` 





1 
JO = apy Han + 1)) 10-1) = 


^ 5. KE. J. GUMBEL κ 329 
` To control the agreement between the ‘observed’ cumulative histogram and 


the ideal straight line we use two control curves through the points y x c, m, 
"where y yis given by (27) and σ by (287). They are traced in the figure for n = 20.. 





ο , 0:2 0.4 0.6 O8. . Το 
Probability y= W(z,) ` 


Fig. 1. Control curve for uniform distribution. 


—— Mean mth point y (27) — —— —- Modal mth point fj (25) 
— — — — Pearson’s mth point ψ (23) |--|--|--} Control curves 


It is interesting to calculate the area A bounded by these two curves and to 
_compare it with the area J. If we consider mas abscissa and gc and ᾖ--σ as - 
ordinates we have, for n sufficiently large, 


e. =|" (y¥+o-G—o)]|dm 
- = [^ c dm. 7 
If we introduce ¥ as variable of integration we obtain from (28") 


OB l) (MD yan 
~ n+ 2) wma ee 


+ 


For the expansion of f, it is sufficient to put 


330 ' Simple tests for given hypotheses 
The transformation li 

» Y=sintt, l—g-cosi, dy = 2sint costdt, 
leads, as is well known, to 


(n+ 2) 
. x n+l 
The limits are given by 


Gee, Bae tee! 
oS ati * i xxl 


A = }{t—sint cost] 


1 Paec] 1 : 
P arosin f X17 Jati) 941) 40 1)' 
provided (n-- 1)» 1. Under the same condition, ' 


arcsin J(1— -αᾱ-- = aro cosa 


` m arc sin 2, 
becomes for any |z|« 1 


arcsin (1—2?) = r= +}. 


Therefore ; bs : + σας 
102 ας ôn) (n +1) 
sg tiiat 3 4 Jnrl) 8-1) Jn 1) 


. The second factor in the brackets becomes ` 


2 sin ty /(1 —sin® ἐς) (cos*ty—sin®t)) = 2 ein £y /(1 — sin? to) (1 — 2 sin? to) 


EY 











i 2 4n (n — 1) 
i (n+)? 
In the same way ! , 
5 If n n 2n 2an(n—1) 
2 zan AREE ΡΞ = 
sin 24 -- urina 2 Gal n+ jJ! E asi) (n+l) ᾿ 
E Finally, we Bach the area bounded by the control curves 
wc n+l G (n— 1) n 1 P 1 à 
Jn--2) 4^ (n1* γα) GE 1) c 1) 
According to (29) the ratio of the area bounded by the control curves to the area 
of the cumulative histogram i 7 : 
A_nt+l 2 ς (n — 1) n 1 "ne 1 } (30) 
J  n—-lq4(n-2)4 (11) (n1) 6(n4- Vint) 7" ; 


converges towards zero as 1: n. 


E. J. GuMBEL 331 


The properties (29’) and (30) of the cumulative histogram of the positions of 
the mth points allow for the comparison of the ‘observations’ W,(z,*) and the ideal 
points m/(n+1). It will often be sufficient to inspect the deviations between the 
‘observations’ and theory to judge which set of ‘observed’ points is closer, 
on the whole, to the theoretical positions. It seems legitimate to prefer a hypo- 
thesis A, if the control area contains more points for A, than for h,. : 

In order to secure a numerical test, we can introduce the mean 653 of the sum 
of the squares of the differences between the ‘observed’ values ym = W(z,,) and 
the mean positions Ym = m/(n+1). Take . f 

est X (m-a) (31) 

N mæl 
where the value of the constant k will be specified later. One extreme for (31) 
would be to have the theory hold for every point. Then the value of the sum 
would be zero. The other extremes would be when all points are concentrated 


- either at the origin, zero, orat the end, 1. In the first case 
1 ας a 


πω λα — 6naly 


The second case leads to the same value, since Xm = $n(n-- 1) and therefore 


n 2 n 
Jes (i c ea: 1 Σ m? 





Wai n+l n(n4- l)n 
` ‘Qn4+1 k 
2 =. 2 
Therefore 0<6 σσ (39) 


In order to draw conclusions from an observed value ©? we have to calculate its 
expectation €. We will determine & in such a way that €? is independent of n. 
A test similar to (31), but serving another purpose, has been introduced for 
the usual distributions by H. Cramér (1928) and R. von Mises (1931). When 
applied to uniform distributions, this w° test leads to the use of ým of (23) instead 
of the mean value Fm. For this test the sum of the deviations is zero which does 


not hold in our case. The expectation 653 of €? is 


we kí 9 — IMT n m? l 
ir (21% ntl Τατ DE 

The first two sums are obtained from (27) and (26). Therefore 

| y m(m4- 1) ym | 
k| 4 ; 

~ n\(n+1)(n+2) (n+1)* 

͵ id e 2 

~ n(nt+1)\n+2 (n-1)(»-2)' 


Biometrika XXXII 22 





e 








332 Simple tests for given hypotheses 
The introduction of the sum of the powers of the natural numbers leads to 





ei k -25 | Kk 
~ 2(n4-2)|  3n—-3]  6(n-—-1) 
Taking k = 6(n + 1) we propose therefore, as test of a hypothesis ἦρ, the coefficient 
6(n+1) 7 -m M 
2 t ten 


which, according to (32), can assume values between zero and 2n + 2, and has for 
expeotation the value gis 1 (34) 
Of two competing-hypotheses, the one with the smaller value of €? is to be 
preferred. 

The €? criterion does not introduce any arbitrary classification. It makes ' 
use of all observations. Besides the probabilities W(z,,) corresponding to the 
observed values z,, no new calculations are needed. The test haa & clear meaning 
and its application is simple. This is due to the fact that it is a natural consequence 
of the probability integral transformation. 


SUMMARY 


We propose the following procedure for testing statistical hypotheses: The 
constants for competing hypotheses, having the same number of constants, are 
determined in such a way that their precisions are approximately the same. Then 
we calculate the probabilities W,(x,,), W,(z,,), ..., and their respective control 
curves. We trace W(x,), m Fom and compare it with the observed frequency 
curve. If neither the classical tests nor the control curves indicate a clear superi- 
ority of one of the hypotheses we consider the probabilities as ‘observations’ 
&nd plot the corresponding pointe on the y axis. We now compare, by formulae 
(9}—(17), the ‘observations’ with the theoretical uniform distribution and apply 
the χ᾽ and entropy test of formulae (18) and (21), respectively. If necessary, we 
repeat these tests in such a way that the actual position of each point is taken into 
account. Formula (18’) gives a value of x? which is independent of the classi- 
fication. Then we plot the cumulative frequency of the ‘observations’, which is 
compared with the straight line (27) and controlled by the values given in Table 2. 
The final test is given by-(33). 


E. J. GUMBEL 333 


REFERENCES " 


CRAMÉR, H. (1928). On the composition of elementary errors. Skand. Aktuar. Tidskr. 
11, 13-17, 141-80. 
GuaBEL, E. J. (1935). Les valeurs extrêmes des distributions statistiques. Arin. Inst. 
Henri Poincaré, 4, faso. 2, 115-58. 
— — (1936). La précision de la moyenne arithmétique et de la médiane. Aktuar. Vedy, 
6, no. 4, 145-54. 
—— (1937). Les intervalles*‘extrémes entre les émissions radioaotives. I. J. Phys. 
Radium, série ντι, 8, no. 8, pp. 321-29; τι, 446-62. 
—— (1939). Les valeurs de position d'une variable aléatoire. O.R. Acad. Sci., Paris, 
- 208, 147-9. 
“Mises, R, von (1931). Wahrscheinlichkeitsrechnung, 316—35.. Leipzig. 
NEYMAN, J. (1937). ‘Smooth test’ for goodness of fit. Skand. Aktuar. Tidskr. 149—99. 
NEYMAN, J. & PEARSON, E. S. (1928). On the use and interpretation.of certain test criteria 
for purposes of statistical inference. Biometrika, 204, 263-94. 
ῬΕΑΕΒΟΝ, E. S. (1938). The probability integral transformation for testing goodness of fit 
and combining independent tests of significance. Biometrika, 30, 134—48. 
PEARSON, KABL (1933). On a method of determining whether a sample of size n supposed 
to have been drawn from a parent population having a known probability mtegral 
has probably been drawn at random. Biometrika, 25, 379—410. 
— — (1834). On a new method of determining goodness of fit. Bvometrika, 26, 425-42. 
Pearson, Kanu & PEARSON, M. V. (1931, 1032). On the mean character and variance of a 
ranked individual, and on the mean and variance of the intervals between ranked 
individuals. Part I: Biometrika, 23, 364-97. Part II: Biometrika, 24, 203-79. 


22-2 


334 í 


. THE RANGE IN RANDOM SAMPLES 
By H. 0. HARTLEY 


1. INTRODUCTION 


Ir the observations x, (v = 1,2,...,n) of a random sample are arranged in 
ascending order of magnitude (z,,, > z,) the range w in such samples is defined as 
the distance between the two extreme observations 
. W = L_—2y. 
It may therefore be regarded as a measure of the variability or disparo among 
the observations of the sample. Theoretically its efficiency in the sense defined by 
R. A. Fisher is, as a rule, much inferior to that of the standard deviation. More- 
over, extensive investigations have shown that its random sampling distribution 
is markedly dependent on the parental population (E. S. Pearson, 1926). For 
large samples x, drawn from a parental distribution f(x) the extreme values z, 
and z, will lie right inside the lower and upper tail of f(x), and in practice it is 
only in exceptional cases that the exact shape of f(x) has been established to such 
8 degree of accuracy that the resulting distribution of w can be trusted for large n. 
In most cases the use of the range must therefore be limited to small samples, 
say with 2<n< 20. 
Large numbers of small samples may often be used with advantage when the 
“mean range is calculated as an estimate of the standard deviation of the popula- 
tion (Pearson & Haines, 1935). Although theoretically such an estimate is not 
efficient and certainly not sufficient, it is nevertheless of considerable importance 
in many fields of application because of its simplicity. Statistical control charts 
in industrial quality control make extensive use of it, and more recently the 
range has been applied to investigations in gunnery. 

In some fields of application a disadvantage may arise from the fact that the 
range is an inexact statistic; its random sampling distribution depends on the 
standard deviation of the parent. This applies in particular to the analysis of 
small samples in biological experiments. The tendency of modern small sample 
theory has been to replace such statistics by what are called exact statistics, 
obtained by substituting for the unknown standard deviation of the parent an 
estimate calculated from an independent sample. This particular process of 

: reaching exact statistics has sometimes been referred to as ‘Studentization’. 
A general theory of this process will be given in a further paper which it is hoped 
to publish in this journal, where it will be shown how estimates of scale parameters 
in general, and of the range in particular, may be converted into exact statistics. 
In this paper, however, we deal with the case where the standard deviation of the 
parent is known. Indeed, it is this dependence of the random sampling distribu- 
tion of range on the scale parameter of the parent which makes it possible to 
estimate its efficiency as an estimate theoreof. 


H. O. HARTLEY 335 


The question of grouping has been a subject of investigation in the case of- 
the sample standard deviation; we shall here.deal with the effect of grouping on 
the range, a problem which has so far received, we believe, no attention what- 
soever.* As practical examples of the occurrence of grouping we may quote 
three instances: | ` ] : 7 

(a) The rounding off of data for convenience of recording and analysis. 

(b) The recording of data to the nearest unit of measurement. Where the 
technique is of low accuracy (see e.g. Tildesley, 1940) the unit of measurement 
will be comparable in magnitude with the standard deviation of the actual data. 

(c) The analysis of data which are classified in categories. In such cases we 
may often find that the original data are unobtainable so that group frequencies 
are the only material available for an analysis. 

It will also be shown how the random sampling distribution of the range in 
grouped samples provides a suitable approach to that of the true range (ungrouped, 
range in sample) on which extensive work has been done in earlier papers published 
in Biometrika. The mathematical formulae developed in this paper make ‘this 
complex distribution amenable to a tabulation. For the case of normal samples, 
the work has actually been carried out and the resulting tables of the probability 
integral are given and discussed elsewhere in the present issue of this journal. 


2. THE DISTRIBUTION OF THE RANGE IN A GROUPED SAMPLE 


Let us denote by αι, ..., Zp the observations in a random sample drawn from 
the parental distribution f(z)| and arranged in ascending order of magnitude. 
This sample is now classified in groups or categories of constant length h with 
equidistant. end-points 


iy E— kh, ..., E— h, E, E+h, ..., AS (1) 


covering the whole x scale from — co to --co. Let us denote by £, and & the 
respective centres of the categories containing x, and αι. Then the problem is to 
find the random sampling distribution of the range in a grouped sample, i.e. of 
5, — £j. The mean of this distribution is of particular interest. Obviously this 
statistic can only assume values which are multiples of the group interval A and 
is therefore discontinuous. Like the distribution of the ‘ungrouped’ or true range 
it depends on the standard deviation σ of the parent f(x). In addition, it depends 


* The effect of grouping seems to be of some importance in researches on the technique of 
anthropological measurement (Tildesley, 1940), where some of the results given below have already 
been applied before this paper had gone to press. . 

T We shall deal here with a parental population represented by a ‘piecewise continuous’ distri- 
bution function f(z). A function is called ‘piecewise contmuous’ for —o0 «z« coo if m any 
closed interval of 2 the function f(x) is continuous apart from a finite number of ọrdmary dis- 
continuities. If the actual range of the variate is bounded we simply define f(z) 20 outside this 
range. Moreover, we assume that f(z) has contact of at least second order at +00. It is easy to see 
how our results may be generalized to cover distribution functions with singularities. 


336 The range in random samples 


on the category width À and on the position of the category midpoints relative to 
the population mean X. Of these parameters only À will in practice be known. 
- Methods to eliminate c are to be given in a separate paper whilst the elimination 
of X is dealt with in the section on randomized grouping (6). 
It will be convenient to use the following notation: 


: Eh i Hh’ o po 
ff near, [ =| peas, [ τς fede 
Eth — 0 — 00 t E ih 
Let us now find the chance that ἔξω -ξι : 


is at most (m — 1) h, and that in addition 


f= 5+ (448) 
for a particular value of ;. This chance is given by 


t+m\n t+m\n 
μμ e 

i t+1 
The first term in (2) represents the probability for all x, to lie between £+ ih and 
£+(t¢+m)h. From this we have to deduct the chance for all z, to lie between 
E+(t+1)h and €+(i+m)A which is given by the second term of (2). In taking 
the difference we are therefore left with the chance for the occurrence of a sample 
completely contained in the interval £ -- ih to E+ (1 -- m) h but with at least one 
of the αι lying between £--*h and £+(t+1)h. This proves that (2) represents 
the required chance. Now, since all samples may be classified with regard to 
their lowest category, the probability for £, --ξι to be at most (m — 1) h is given 
by summation over all { of the expression (2). If we denote this probability by 
P(n, h, m — 1, £) we find 


Του”... 


With equation (3) we have reached a formal representation of the random 
- sampling distribution of ἔ, --ἔι. Its evaluation is a simple matter for large group 
intervals h and for parental distributions f(x) with a tabulated probability integral 


μα dz. If we were to take the trouble of tabulating the probability integral 


(8) we should obtain the mean of ἔ, -- ἔι as a by-product from a summation of (3). 
It will be shown in the next section that this summation, if carried out analytic- 
ally, produces a very simple formula for this mean. 


3. THE MEAN RANGE IN A GROUPED SAMPLE 


To find the mean of the distribution it is convenient to extend the summation 
in (3) from some finite negative value 1 = —j up to +00. By choosing j sufficiently 
large the resulting error may be made negligible. We introduce 


am EETT-ZY] 


H. O. HAnTÉEY 337 
and find for the difference between P(m — 1) and p_,(m—1)* 


- 41 ~j+1 9 
|Pm-1)-p.4m-)1« X af | : (5) 


3 í επ 
for all m and j. To find the mean of ἔ, -- £, we must first note the probability for 
this statistic to be exactly equal to mh, where m = 0, 1, 2,.... Deneting this 
probability by ó(m) we have from the definition of P(m) 


(m) = P(m) — P(m —1). 
If we denote the mean of £,, — £, by £ we have by definition 


E —h Y, é(b)k 
k=O 


= h lim {(m+ 1) P(m) — 84), (6) 
where S, = (m+ 1)$(0)+mg(1)+...+(m) - 
or S, = P(0)+ P(1)+...+ P(m). (7) 


To find Æ let us first consider the second term in formula (6). We have from 
equations (7) and (5) ' 
Sm = p-,(0) +... +P (M) +6, (8) 
f -4+1 
where : le | «n(m 4 nf , (9) 


for all j and m. Now from the definition of p(k) we find 
m m oo ttk+1\n ttk+l\n 
ὃν ας I-A 
k=0 k= eg LV t il 
m { —jtk+i\n eo 4+h4+-2\ n itkt+i\n 
AS VAa V-a 
il =g bay 1+1 i 1+1 
m —j+k+1\n co t+m+2\n 
“ENG Ea) 
ke0 NJ —j 1——3 M {1 


Putting now m = 27, we have 


2j 341 1 n j+1 cen l 
Σ»ω- Ex (SA (y - (10) 
k=0 t=—j74+1\J --ο i=-Jtl\Jt / | 
where it is easily seen that s: i 
" fs a nd Ed . . il 
Ὁ ΜΜ X Gal” (1) 
--- 153 ε---1 t 
Finally, we want to replace in formula (6) the first term 
{27 +1) P(2j) by (2j+1). (12) 


* In this section we deal with fixed group intervals and a fixed sample size n so that-we drop 
the arguments n, 4 and £. 


. 338 8 ' The range in random samples 
` The resulting error is easily estimated. We have from (5) 
«(2j + 1) (1~ P(27)) = (27 +1) (1 — p_4(29)) 1-6. 


, jd 
where [εις (27+ 1)n] s (13) 


* Moreover, according to the definition (4) we may write 


ΟΙ ΗΖ 
eee (77-0: )) 


Ν +7+1\n 
ο. 
eo 442742 oo 
here |6ε| «(2}1-1) Σ nf <n(2j+ nf ; (14) 
.--} αμ ο) 3+1 


so that finally we have 
(2j +1) (1—p_,(2j)) = — €s + € 


with <(2j+ υπ + +f) (15) 


The error terms 6, 6ᾳ, €g, E4, €; and e, are of the form 


e —4 eo {1 
cif cif Bt Σο] 
j —o i=j 1 


It is easy to see that the above terms tend to 0 as j — oo. To prove this for the 
first term we write 


if” «Er. f< οἷ +; i NE 


which tends to 0 as f(x) has contact of order 1 at --οο. The proof for the other 
termsisidentical. For ER large j we can therefore use the approximations 
given by (10) and (12) and transform equation (6) into the convenient form 


ee μασ... 


Equation (16) gives the mean range © in a grouped sample in terms of powers of 
the probability integral of the parental population. For a normal distribution 
this is a particularly simple formula since such powers have already been cal- 
culated by L. H. C. Tippett (1925) and are conveniently tabulated in Table X XT 
of the Tables for Statisticians and Biometricians, Part TI. A table of £ can, there- 
fore, be easily computed by adding a few entries from Tippett’s table and 
deducting the sum from the appropriate value of 2j 4- 1. 

This has been done for samples of five, ten and twenty observations grouped 
in categories of breadth A (see table on p. 339). The parameter £ denotes the 


H. Ο. HARTLEY 3 339. 


distance of the population mean X = 0 from the nearest group end-point. For 
given ἦν the mean range £ in grouped samples is obviously a symmetrical periodic 
function of £ with period h. The table has been extended to cover rather coarse 
grouping intervals (h = 2-20) in order to illustrate the possible bias of range when 
estimated from frequency tables with asfew as two or three categories. Itisapparent 
that for small or moderate group intervals, say h <a, the mean range is practically 
independent of h and £, so that no correction (corresponding to the well-known 
Sheppard's correction for the sample standard deviation) is required for the 


Table of mean range in growped samples drawn from a normal 
population having unit standard. deviation 
Size of sample=n. Width of group interval=h. 
Distance of population mean to nearest group-end point =£. 


— 




















h .E n—b n=10 n-20 
[x | 

0-2 0-0 2-82 593 3-07 751 3-73 495 

0-6 0-0 2-32 593 3-07 750 3-73 500 
0-2 2.32 693 3-07 751 3-73 492 

10 0-0 9.39 532 3-08 122 3-72 917 
0-2 9 39 574 3-07 865 3-73 317 
0-4 2-32 642 3-07 450 3-73 962 

1-4 0-0 2-31 042 3-06 204 3-82 089 , 
0-2 2-31 626 3-06 787 3-78 826 
0-4 2.32 938 3-08 095 . 3-71 575 
0-6 2-38 990 3-09 143 3-66 796 

1:8 0-0 2-29 227 2-90 539 3-67 974 
0-2 2-30 022 2-94 491 3-69 553 
0-4 2-32 023 3-04 621 3-73 085 - 
0-8 2-34 276 3-16 356 3-16 263 
0-8 235 734 3-24 140 3-77 838 

2-2 0-0 2-36 011 2-777 080 3-27 509 
09 2-35 652 2-81 669 3-34 611 
0-4 2-34 991 2 94 307 3-53 896 
0-6 9.39 265 3-11 681 3-79 662 
08 2-30 255 3-28 173 4-03 847 
10 2-28 963 3-38 358 4-18 452 

| ΠΝ 





range. For ἦν = 0-20 the mean range in the grouped sample agrees with the theo- 
retical ungrouped range to five places of decimals (see Table of Mean Range, 
Table X XII of Tables for Statisticians and Biometricians, Part II). For coarse 
grouping the correction becomes important but depends on £ (as well as on A). 
For fixed A, as £ varies from — $h to + $h the grouped mean range oscillates about 
the true mean range as a smooth single-period function. The reason for this is 
obvious. If ἔ has a position such that the average positions of z, and x, both 
happen to fall within the outside halves of two group intervals, then £, — δι will 


340 The range in random samples 


on the average, be smaller than z, —24; vice versa, if the average positions of 
x, and x, are in the inside halves of two group intervals, there will be a pre- 
dominance of samples for which £, — £, is larger than x, — z. 

Moreover, as À increases, the grouped mean range becomes a less reliable 
estimate, and it can be shown that for h»c the standard deviation of the 
random sampling distribution of the grouped range rapidly increases with h. 

We are thus led to consider two problems; one is the elimination of the para- 
meter £ (or the dependence of the distribution on the position of the parental 
population mean); the other is to investigate more closely the random sampling 
distribution of the grouped range. Before dealing with these problems, however, 
we must first consider the distribution of the true range (range in the ungrouped 
~ sample). 


4, THE PROBABILITY INTEGRAL OF THE RANGE IN RANDOM SAMPLES 


As before we denote by 2,,....,%, the observations in a random sample drawn 
from a parental distribution f(x) and arranged in ascending order of magnitude. 
The range in such a sample, defined ag w = x, —2,, may be regarded as the limit 
of the grouped range ἔ, — £, as h, the group interval, tends to 0, i.e. 

w = lim bn = δι. 
h-—0 


The probability integral of the range w, denoted by P,(W), is therefore the limit 
of P(n,h,m —1,£), given by (3), as h tends to 0. To obtain this limit we write 
equation (3) as-follows: 


P(n,h,m—1,&) = T es f(x) dæ)” — "ΜΙ dx)" 


ἐς αλ.) EH EHHI 
to £t GE mA n-i 
= E af ode)" ge 


where £, is some mean value between £ -- ih and £ 4- (i+ 1)h. 
We now put m= Wih or W — mh, 
and let h tend to 0, m to œ, keeping W constant. We obtain without difficulty 


PU) = lim P(n,h,m—1,6) 
-a[ [ες mas) a, a7) 


which is the required probability integral of the range. This integral may be 
* compared with the expression for the distribution function of w which was given 
` by A. T. McKay ἃ E. S. Pearson (1933). It is easily verified that the function 
$(w) given by these authors is the differential of P(W). The expression for P(W) 


: H. O. HARTLEY 341 


is decidedly simpler than.that for $(w) which was used by Pearson for numerical 
work on this function. However, even P(W) is of a complex character, and only . 
in special cases is it possible to evaluate it analytically. For the rectangular 
distribution function (f(z) =1, 0 <% < 1) this can be done easily. 


-- . 
5. THE PROBABILITY INTEGRAL OF THE RANGE IN SAMPLES 
FROM A NORMAL POPULATION " 


€ 


Of particular interest is the case where the parental ο is normal. 
In this case we have 


Λα) = s) = To e. | qs) 


L. H. C. Tippett (1925), E. S. Pearson (1926, 1932) and A. T. McKay & E. S. Pear- 
son (1933) have considered this problem and carried out extensive numerical work. 
The method adopted was to calculate correct values of the means and standard 
deviations of the distributions (as functions of n) and then, with the help of approxi- 
mate values of β, and f, use as approximations to the unknown true distribution 
Pearson-type curves fitted by the method of moments. The numerical results, 
although they have been successfully tested by experimental sampling, have, 
of course, an unknown accuracy. It is therefore desirable to find a method which 
produces P,(W) to known and sufficient accuracy. . 


. We have PW) = { a 2(8) ΙΝΕ aJ at 


-W pto 7 
E f τη = L+I, (say). 
F -W 
Writing g=—(§+W), -ἔ-η-, ᾿ 
eo = n—-1 
we obtain P(W)= nf z(—29—W) (f j z(x) às) dy + Ια. 
-47 —GrEW) 
Using the symmetry of z(z) we may write 
eo η ΓΗ n—1 
BW) = μ w)([" mas) ann. 
E 3 


and writing £ as a variable of integration in place of 7 we find 


ΕΠ} = nf" tt ment. aede) a, 


or PW) = -π[, E+ W) -2(£)] κ (9) as" a 


T 2af” ον Ξ(ξ ἠ- W) ([ 7) de)" dé. . (19) 


“ 


349 The range in random samples Ἵν 


Integrating the first integral and introducing u = E+ W in the second: integral Ue 
we finally obtain ᾿ 


m « ([ aas] on (7 a0 ([* ste) e D 


For large values of W this is an approximate solution of the problem since thé’ 
second ferm. in (20) is small, so that the first term 


ΠΝ 


gives a | fair approximation to P,(W). This expression denotes the chance of © 
‘observing samples with observations all lying between —4W and +4W; all 
these samples have a range smaller than or equal to W. For large W it is these" 
samples which constitute an ever-inoreasing proportion of the total number of i 
samples with range « W. 

The second term in (20), which is always positive, A into account all those 
samples which are not contained in the interval —4W to +4W. This term cannot 
. be ignored if high accuracy is required and if W is small or moderate. Never- 
theless, the work involved in the numerical integration has been considerably 
reduced, for the range of integration is now from +4W to +o. 

The numerical integration of 3 


[2 | κ de)" du 


is best carried out simultaneously for values of n forming an arithmetical pro- 
gression. For fixed u and W, the integrand is then given by the terms of a goo- 
metrical progression with, say, 


z(u) ([ νο ax) 


as first term and ([ amas) 


as common ratio. Such a geometrical progression can be produced automatically 
by certain modern calculating machines. This forms the basic idea of the actual 
computation of the probability integral which is described in detail in another 
paper (pp. 309-10 above). 


6. THE RANGE IN RANDOMLY GROUPED SAMPLES 


The results of sections (2) and (3) on the effect of grouping on the distribution 
of range depend on the parameter £, which denotes the origin of the equi- 
distant set of group end-points 

E+th $-0,1,2,..,, 
aes οι. 
given by equation (1). In practice, however, all we know is the category breadth, ^. 


E H. O. HARTLEY ο 343. 


η then ας the’ κο group end-points £+%h from considerations which 


` are, ‘by- necessity, independent of the position of the mean X of the parental 


EN population f(x), because this position will generally be unknown. One of our 


å group ‘end- one however, is bound to fall into the interval. 


Χ- 1 to X+H, 


à ; ο. 


| wherever this interval may lie. Now, since the origin £ in our system of group 


bo ‘end-points is wholly arbitrary we may assume that for. given À we,-have, the 


“inequality .. -. X-W«Et« X4. 


The fact that our group end-points (and therefore their origin £) are chosen in- 


dependently of X has now to be expressed in mathematical terms. This is done 


by- assuming that we are dealing with a population of values of ë (being the origins 


^ vof corresponding systems of group end-points) which are rectangularly distributed 
- in the interval X—4A«£«x X-- 4h. This condition is exactly fulfilled where 


grouping has been introduced through rounding off of data (example (a) on 


: p. 335), and it is often an appropriate assumption in the other examples, as in 


many other cases of grouping which occur in practice.* 

In order to derive the distribution of range in samples randomly grouped in 
the above sense we have to return to section (2). In this section we derived the 
probability P(n,h,m —1,£), giving the chance that the difference between the 
centre points £, and £, of the highest and lowest category covered by a sample 
of n items is at most (m—1)h, where h is the constant category breadth and - 
group end-points are given by (1). The frequency distribution of En & which 
we may denote by ¢(n, h, m, £) is then given by 


p(n, h, m, E) em P(n, h, m, £) — P(n, h, m — 1, £), 


and represents the chance that £,, — δι is exactly mh, given a particular value of £. 
The corresponding frequency distribution for random grouping may be denoted 
by φίη, ἦν, m). To derive it we may apply Bayes's Theorem and obtain 


1[χ11} 
oln, ἦν, m) = i p(n, h, m, E) d£. 
X-3h 
The resulting cumulative probability may αν be defined by 


Pih = X bles) = ql X des d£, 
which yields the nno relation for n ee probabilities 


DfXHM 
Pih m) ox [Pen m, £d 


* In certain cases, when grouping is very coarse, it may be advantageous to use an estimate 
€ of the population mean X (either dependent or independent of the sample whose range {8 oon- 
sidered). The increase in information is akin to that given by an ancillary statastio in estimation 
theory. However, from the results in $83 it would appear that little information is gained where 
the grouping interval is small or moderate. 


344. ER The range in random samples 
Substituting, now, the expression (3) for P(n, h, m, £) we obtain 


Pin, hm) = el * E | [erm a - ( μα asy 4t 


has xA ea Et GEL 


cA may repe o 


This formula may be reduced to & simpler form in which its relation to the pro- 
bability integral of the true range (17) becomes apparent. We introduce the 
gecond integral 


ΜΥ) = Í Pal) du 
- Í "a f ix "et P fe dx)" dëdw. (22) 


The first integration is with regard to w and the integrand is an integral with 
regard to ξ. If, now, in this latter integral y = + w is used as variable of integra- 
tion in place of ἔ we have 


BW) - ^» por) NC dydw. 


We now note that the integrand may be written as a differential with regard to w. 
Thus, interchanging the order of pene we obtain 


rafal po ot 
-[- (f joa dy, (23) 


thus eliminating integration with regard to w. The surviving variable of integra- 
tion ῃ may now be replaced by = y — W so that we reach the final result 
EY 


an = [7 ([P feas] ac (24) 


We now observe that this integral is identical with the one occurring in the 
expression (21) for P(n, h, m), and we note the relation 


P(n, hm) = > (E, (mE Th) — E, (mh) 


1 fmc) 
mh 


This simple formula makes it possible to obtain the effect of random grouping on 
the probability integral of range P,(W). In particular, for normal samples for 
which P,(W) has been tabulated at the fine interval of Aw = 0-05, the second 
integral F,(W) is easily obtained numerically by summation of the tabular entries 
in the table on pp. 302-7 above. 


H. O. HARTLEY : l 345 


For h>0, m>œ, mh—>W, we note that, as expected; P(n, h, m) 3 P, W); 
that is, as grouping becomes finer and finer the probability integral gf the grouped 
range tends to that of the true range. What is not. quite obvious, however, is the 
identity of mean grouped range and mean true range, no matter how large the 
breadth, h, of the randomly placed groups. This point we shall now examine. 

. If we denote by w(n,h) the mean range in samples of n items classified ix 
groups of breadth h randomly placed, we have by definition 


w(n, h) = Σ ον h,m)mh = Σ Ες, h, m)— P(n, h, m —1)) mh. i (26) 
We now introduce central differences of the function F,(W) and use the notation 


Ant m F,,(m-+1h) — F,(mh), An = H(m+ m+ 1h) — 2F, (mh) TE(m-— m — Ih), (27) 
so that we obtain from (26), (26) and (27) d 


w(n,h) = Y Anm 
A m=i 
We may now write 


M ; 
w(n,h)- lim Y; Apm 


M-o mel 
Pm tim [Meun Σ X Ao) 
= lim (MA,,, — F,(MA)). (28) 
Mo . 
On the other hand, we have for the mean true range (Ù, say) 
Wp = [ruso dw, 
where f,(w) = ER, (w) is the ‘distribution ποπ of the true Tange. We, 


therefore, have í Mh 
= lim "uf o) dw 


Mo 
a = P,(w) jd»). (29) 
Mika 
On taking the difference of (28) and (29) we see that 


ων, hj, = tim MM ca PR). 
M 


3 53 ]fOr-0A 

Now σάιτ P,(w) dw =P,(Mh) +f, (ὦ) ἐν», 
h hj un 

where Mh<wt<(M+1)h and A*«h, 


so that - jwn, À) -B, | < Jim. f, (to*) ΜΡ. 


946 The range in random samples 


From this relation it is obvious that w(n, h) = Wp since f,(w) has contact of at 
least second qrder as w -> œ.* : 

We have proved, therefore, that if grouping is random the expectation} of 
the mean range w(n, h) (mean grouped range) is identical with the true mean range 
W, 80 that no bias is introduced through random grouping, no matter how large 
the ρτοὐρίηρ interval h. However, if we wish to use £, —£, as an estimate of w, 
this estimate, although unbiased in the sense defined, becomes less and less 
reliable as ἦν increases. This is borne out by its random sampling distribution or 
its probability integral P(n, h, m) given by (25). For normal samples it is an easy 
matter to tabulate P(n,h,m) from the table of P,(W) (pp. 302-7) and thus to 
follow up the numerical increase of its standard deviation as A increases. How- 
ever; to cover the casé of a general parental distribution f(x), we shall derive an 
analytical formula for the variance of ἔ, — £, from which approximate numerical 
results are easily obtained. Ἢ 

In order to obtain this formula we consider the second moment of £, — £i 
which we may denote by x(n, h). We have by definition and from equations (26), 
(26) and (27) 

- Μ 1 
Mln, h) = lim Σ mhz An 


M+>om=0 
M 
= lim Σ hm(m-- 3) Aj, phn (30) 
` M-—om-0 
Now we may write 


M at M 
Σ πίον 1-3) 4s, = D mm da m-t Spy} — X má 


mo 


] M-1 M-1 
= MM +} Anis Σ MHE) Ami ME (Mh) + Σ E,(mh) 


m= m-0 
] | AI-1 
= M(M +4) Aur (2M --8) Επ) +2 Σ F. (mh). (31) 
Using equation (31) we obtain for the second moment (30) 
: 1 1 BI~1 h 
= i — ae = — À m inrer w. 
pn, b) = lim (ari (: *g x) 2MhF (Mh) ( 1 ix) +2h Y, E, (mh) ον). 
' (32) 
This formula enables us to compare y(n, h) with u,(n), the second moment of 
the distribution of the true range. We have by definition ' 


Mh 
μετ) = lim Í ffo) wi dw, 
- Af—J0 


* It can be proved that the order of contact of f,(w) is the samo as that of the. parental dis- 
tribution f(z). 

T If repeated samples were drawn from the same population and the same grouping system 
used in each case, the mean grouped range would be biased by an unknown emount. But in repeated 
experience with different populations the expectation of this bias is zero. 


1 


H. O. HARTLEY 341 
which we transform by two partial integrations into the equation 


us (n) = lim (αυ P,(Mh) —2(Mh) F,(Mh) +2 Í Tudu) . (98) 
—0 0 


Taking the difference of (32) and (33) we obtain 


; Mh AF (Mh) h_ Š 
mlm hj- a(n) = lim. (h N og 


AM-—o 





n [nx "E (mh) ΛΜ) - 2| Fw) au}. (34) 
BI—> co meo 0 

Since the expeoted mean value of the grouped range is the same as for the 
true range, the difference in second moments about zero equals the difference in 
variances. The first term on the right-hand side of equation (34) is obviously 0 
(see equation (29)) whilst the second term is best evaluated with the help 
of Gregory's formula for numerical integration (see e.g. L. J. Comrie, 1936, 
p. 809). Using this formula we can express the difference between the integral 
and the finite sum in equation (34) in terms of the differences of the integrand 
E(w) at the two ends of the range of integration. We obtain 


f 2h., py, 2. ay 
Maln, h) — yaln) = lim 115 (A 1A} +5 eat AD 
ΛΜ-»οο 


= ph TAS ARS (36) 
provided the Gregory expansion is convergent.* 44, Ar, Δε, ... are advancing 
differences of the function F (w) at w = 0. g 

_ Equation (35) yields the desired formula for the second moment of £, — &. 
For most parental distributions the resulting probability integral of the range 
will be practically 0 for a certain range in the neighbourhood of the origin (see for 
instance the behaviour of P,(W) from a normal-parent given in the table on 
pp. 302-7). For such parents and for moderate h we have 


A4 ATS σα, 
80 that οί, h) — y(n) 5: $82. (36) 


For small or moderate values of h, therefore, the increase in variance of the 
grouped range is given approximately: (and for most parents to a high degree of 
acouracy) by $^. This increase is double the amount given by the well-known 
Sheppards correction of 4, A?. Indeed, had we grouped a sample of true ranges w 
in fixed categories of breadth h, the resulting second moment of the grouped 
distribution of w would have an expectation which is ες À? in excess of the second 
moment of the true range. With random grouping of the original sample (as it 


* This condition is as a rule fulfilled for values of A which do not exceed the standard deviation 
of the parental distribution f(z). 


948 The range in random samples 


has been defined above in accordance with common practice of grouping) an 
additional ungertainty is introduced by using a new, randomly selected, set of 
group intervals each time a new grouped range is determined. This additional 
uncertainty has been proved roughly to double the excess of the variance and the 
result is an increase of 44? over the variance of the true range. 


I wish to acknowledge with gratitude the helpful suggestions and criticisms 
made by Drs J. Wishart and J. O: Irwin and by Professor E. S. Pearson at 
various stages of this investigation. 


REFERENCES 


ΟΟΜΉΤΕ, L. J. (1936). Interpolation and Allied Tables. H.M.8.0. 
MoKay, A. T. (19035). Biometrika, 27, 466-71. ᾿ 
MoKay, A. T. & PxansoN, E. S. (1933). Biometrika, 25, 416—20. 
PEARSON, E. 5. (1926). Biometrika, 18, 173—94. 

— — (1932). Biometrika, 24, 404-17. 
Parson, E, B. & Hares, J. (1935). J.R. Statist. Soc. Suppl. 2, 83-98. 
Tuprsiey, M. L. (1940). Man, 40, 180-9. 

TrerzTT, L. H. C. (1925). Biometrika, 17, 364-87. 


Ὃν : MISCELLANEA 


(i) The Second Yearbook of Research and Statistical Methodology Books 
and Reviews. Edited by Oscar KmiseN Bunos. The Gryphon Press; 
Highland Park, New Jersey, 1941. $6. 


This 18 a second and much enlarged issue of a volume published in 1938. It contains 
nearly seventeen hundred review excerpts on 346 statistical and allied books (in the 
English language only), extracted from 283 different journals. The editor has attempted 
with considerable success to cover the whole field of statistical and probability theory, as 
well as their applications in every possible direction. He has also included reviews of a 


` number of books on the general history of science, on scientrfie method-and on the social . 


relations of science on the ground that they are—or should be-—of general interest to 
scientific workers in every special field. Included in this category are books such as 
J. D. Bernal’s The Social Function of Science, J. G. Crowther’s The Social Relations of 
Science and J. B. S. Haldane’s The Marxist Philosophy and the Sciences. 

- This large volume of several hundred pages has been admirably produced and arranged. 
It is intended to publish a fresh volume every two years containing reviews that have 
appeared i in the interval. The Preface sets out a variety of reasons which, in the Editor's 
opinion, Justify the present venture and even its enlargement in the future if sufficient 

.Support is forthcoming ; at the same time frank expressions of opinions are-asked for from 
readers and reviewers. 

The objectives of the Yearbook as set out may be classed under four general heads: 

(1) To help students, teachers and librarians to select text-books with greater dis- 
crimination and to point out to them the weak and strong points of particular books. 

(2) Το indicate the width of the subject of statistics and the many fields in whieh it 
is applied. 

(3) To make students and teachers aware of the inadequacy of much that is now pre- 
sented in text books and classes; to discourage the publication of books written by persons 
ignorant of the latest developments in their subject. 

(4) To improve the quality of reviews by encouraging editors-and reviewers alike to 
take their responsibilities more seriously. 

With the last three objectives it is hardly possible to quarrel, “and it is likely that the 
wide circulation of this volume would.provide one of the most direct methods of attainmg 
these ends. The first objective is, however, presumably the most important, and there are 
bound to be differences of opinion on the probable success of the book in this direction. In 
the ordinary event the teacher will no doubt be made aware of new books in the field with 
which he is concerned by reading the notices in one or two journals specially devoted to his 
subject. Having obtained a suggestion of a hkely book he must surely get hold of it and 
determine by reading it himself whether it is suitable and abreast of the latest develop- 
ments. If he is not competent to do this, but must base his decision on the advice of 6-10 
reviewers, it seems doubtful whether he should be teaching the subject at all. 

After reading through the reviews on some dozen books contained in the present 
Yearbook, I am inclined to the following conclusions. Regarding books of outstanding but 
perhaps rather controversial character, as those of Harold Jeffreys and Richard von Mises, 
the reader will certainly gain a useful impression from the collected reviews. This is partly 
because in such cases the standing of the reviewers 15 high and their reviews interesting and 
fairly written, even if critical. But im the case of the more elementary text book, the 
position is rather different. Quite often the opmions expressed are diametrically opposite. 
In cases where I knew nothing of the book or its author I found myself inevitably forced 
to form an opinion from my own personal knowledge of the experience, the special interests 

να 


` 


Sie. 


ο. 7 : 
# re » « >» 5 


350 -ἡ E C Miscellanea eet i 


and even. the character of the reviewer. Such inside μοδα, will generally not be 
possessed ly thg College instructor atd*certainly not9by the student. We cannot, " hink, 


‘escape the conclusion’ that the teacher-who has to select a text, book før his students must 


be competent to decide on ‘ite merits himself and, if he is not, he ἘΠῚ be only confused by 
the varied opinions contamed in the Yearbook: * 

Three future directions, in which the-volume might be. enlakged | aro contemplated : 

(1) The inclusion of reviews of foreign language (i.e. not-English) books. 

(2) The addition of & section devoted to non- critical abstracts: of periodical literature 
on research and statistical methods. $ 

(3) The publication of.origmal criticisms by, one or more persons (according to the 
importance. and controversial nature) of articles and papers ın the periodical literatüre. 

The first-addition 1s clearly desirable; the publication of translations of reviews in 
foreign journals, of our own American end British books would probably be useful too; 16 
would help-us to see ourselves as others see us. With Tegard to the second'and third pro- 


_posals, the great difficulty,is of course to secure the services of sufficiently competent 


abstractors or critics for so large an undertaking. If, quoting the Editor, the statistical 
student and teacher are to be kept ‘abreast of modern developments in statistical theory’ ; 
to be warned ‘to ignore much of the literature which either presents nothing new or presents 
inefficient or incorrect methods of statistical analysis’; to be told what are ‘sloppy, value- 
less, and erroneous articles’ and what are ‘well-written, significant contributions’, it is 
clear that a very great responsibility will he on the Editor of the Yearbook and his col- 
laborators. As Prof. Buros indicates, the organizing and editing of such a comprehensive 
service would need the support of a foundation interested in fostering the advancement of 
research. Indeed, to avoid duplication. the organization must be built up on an inter- 
ο 1 nal basis, possibly in collaboration with such bodies as the American Statistical Asso- 
obion and the Royal Statistical Society between whose representatives some discussion 
ora similar project took place a few years ate E. S. P. 


ty 


3 





